grand-mother / gtot

GRAND to Trees C++ code
1 stars 0 forks source link

Feedback from tests with Auger data #1

Open grand-oma opened 1 year ago

grand-oma commented 1 year ago

Some comments on gtot/ROOT files after the test on Auger data (still not complete) First of all I managed to look and play with the data without too much effort, which means that the main objective is fulfilled and analysis is possible with present soft 😊! Yet I still feel clumsy when handling ROOT files and there is probably room for improvement here. A script / example / documentation would be very helpful to understand how to navigate at best in the file and manipulate objects I have 3 specific questions/suggestions related to that:

  1. A print() function exposing the elements and structure of the ROOT file would be very helpful I believe. It would be very nice if users could see in one glance ALL TTrees present in the file, their structure (and associated names), the number of elements in them, etc… ie something which goes beyond the present print_metadata(). Is that doable?
  2. The manipulation of objects is a bit awkward to me. For example I find it a bit inconvenient to load events [eg evt.get_event(listevt[i][0],listevt[i][1])], and the fact that objects are in a ROOT/C format (eg StdVectorList) probably does not help for pure python programmers… Maybe this is just a bias from somebody not familiar at all with ROOT, but in my case I found the wrapping layer proposed by JM more familiar (https://github.com/grand-mother/grand/blob/beta_dc1/grand/io/root_files.py).
  3. More critical may be the necessity to write analysis results to theROOT file. For now in my analysis script [https://github.com/grand-mother/grand/blob/beta_dc1/scripts/MDanalysis_v0.py] I write them in a npz file, but I would find much more natural to do that in the initial ROOT file. This could be discussed but I think this is in line with the general philipsophy we chose fgor the analysis (ie one single file updated through the analysis process). I have no clue how to write the results in the file. Here again an example script would probably be very usefull! Some remarks/issues with a more limited scope:
  4. gtot messages could be improved. Here is the terminal output when I run it monitoring data (MD): Writing GRAND ROOT file to /home/data/GP300/argentina/auger/MD/GRANDfiles/md006000_f0003.root
    Processing md006000.f0004

    The header length is 36 bytes The event length is 2600 bytes New event The event length is 2600 bytes New event The event length is 2600 bytes New event The event length is 2600 bytes New event The event length is 2600 bytes New event The event length is 2600 bytes New event The event length is 2600 bytes New event The event length is 2600 bytes New event The event length is 2600 bytes New event The event length is 2600 bytes New event The event length is 2600 bytes New event Cannot read the Event length

This should be modified.

If I run it on a AD file, then I get: Writing ADCEvent tree Error in : Cannot build a TreeIndex with a Tree having no entries Writing VoltageEvent tree Finished, quitting … Yet the file seems empty.

In general would be nice to have messages saying which TTRees are filled in, and which are not.

  1. Also errors are not handled when the file does not exist: root@lpnlp106:/home/soft/gtot# ./gtot coco Error opening file coco

    Break segmentation violation Generating stack trace... /usr/bin/addr2line: '0x00005587abc39af7': No such file /usr/bin/addr2line: '0x00005587abc395ce': No such file 0x00005587abc39af7 in main + 0x191 from ./gtot 0x00007f37eb3030b3 in __libc_start_main + 0xf3 from /lib/x86_64-linux-gnu/libc.so.6 0x00005587abc395ce in _start + 0x2e from ./gtot

  2. Several fields are still wrong/not human readable, eg GPS coordinates, temperature, run type/trigger mode, etc. Similarly various important infos seem absent, such as ADC conversion factor, or amplifier gain (value set at run start in DAQ GUI).. I think these should be in the ROOT file. I guess this requires input with DAQ people.

  3. In most files ROOT (eg md006080_f0003.root) TBrowser shows several teventadc and teventvoltage TTrees, with a digit after it (eg “teventadc;4”)… What does this mean?

  4. In most cases the field evt.adc_samples_count_channel0 is a list with one element… Why a list then? Yet in some specific case (eg event 1 of run md006020_f002.root), it holds 5 elements… Is that a bug in the raw file? If not, what would it mean?

  5. In some other cases the event size is not the same for the first events and the following ones (eg md006050_f0003.root, with adc_samples_count_channel0 = 1024 for first 3 events and 10240 after that)… I guess this falls in DAQ sandbox, but is there a workaround that we could think about?... Or maybe we want to leave it like this?

Looking foward to discussing this in more details :-) !

lwpiotr commented 1 year ago

On 05/12/2022 09:26, Olivier Martineau-Huynh wrote:

Some comments on gtot/ROOT files after the test on Auger data (still not complete) First of all I managed to look and play with the data without too much effort, which means that the main objective is fulfilled and analysis is possible with present soft 😊! Yet I still feel clumsy when handling ROOT files and there is probably room for improvement here. A script / example / documentation would be very helpful to understand how to navigate at best in the file and manipulate objects I have 3 specific questions/suggestions related to that:

  1. A print() function exposing the elements and structure of the ROOT file would be very helpful I believe. It would be very nice if users could see in one glance ALL TTrees present in the file, their structure (and associated names), the number of elements in them, etc… ie something which goes beyond the present print_metadata(). Is that doable?

It's been always on the todo list, I can now prioritise it. I was thinking about a class for a file, but I have one conceptual problem with it: one would use a different class (A file class) for accessing a file, and a different class (a tree class) for writing out trees.

  1. The manipulation of objects is a bit awkward to me. For example I find it a bit inconvenient to load events [eg evt.get_event(listevt[i][0],listevt[i][1])], and the fact that objects are in a ROOT/C format (eg StdVectorList) probably does not help for pure python programmers… Maybe this is just a bias from somebody not familiar at all with ROOT, but in my case I found the wrapping layer proposed by JM more familiar (https://github.com/grand-mother/grand/blob/beta_dc1/grand/io/root_files.py https://github.com/grand-mother/grand/blob/beta_dc1/grand/io/root_files.py).

So now a tree is an iterator too, and this is probably the most pythonic way. Of course, I can add loading events without an identifier, that's 2 lines, but is it useful? I mean, one usually will either iterate over events or get a specific event, which is described by an identifier.

Regarding the StdVectorList, that's not C format. That's is a python wrapper that accepts and returns a python list, but stores the contents in a C++ vector (so it can be stored in the TTrees). I think it may be impossible to make the class lie that it is a real list, but I will recheck. Of course, it could be numpy arrray, not list based. However, one can't append to numpy array easily (on every append an array is reallocated in memory, unlike list), which in turn makes adding specific tracks, etc. to the list difficult. Thus I decided on the list, which is easily convertable to/from numpy, and I guess a very basic python task. But I apprieciate discussion.

While I think some ideas from the JM interface are useful, my 2 remarks are:

  1. Some of those things should be implemented in the tree classes. Adding them in external interfaces makes codebase messy and illogical. I really don't understand why there is no cooperation will here...
  2. I am not convinced about a separate file class for each tree type.
  1. More critical may be the necessity to write analysis results to theROOT file. For now in my analysis script [https://github.com/grand-mother/grand/blob/beta_dc1/scripts/MDanalysis_v0.py] I write them in a npz file, but I would find much more natural to do that in the initial ROOT file. This could be discussed but I think this is in line with the general philipsophy we chose fgor the analysis (ie one single file updated through the analysis process). I have no clue how to write the results in the file. Here again an example script would probably be very usefull!

Finalising the underlying analysis oriented interface is still to be done, but please try:

https://github.com/grand-mother/grand/blob/dev_io_root/examples/grandlib_classes/event_generation.py

  1. Several fields are still wrong/not human readable, eg GPS coordinates, temperature, run type/trigger mode, etc. Similarly various important infos seem absent, such as ADC conversion factor, or amplifier gain (value set at run start in DAQ GUI).. I think these should be in the ROOT file. I guess this requires input with DAQ people.

It does. I would like to get the function converting the ADC values into human readable stuff, that I could use in gtot. I can do it, based on documentation and asking questions, but others really could contribute.

  1. In most files ROOT (eg md006080_f0003.root) TBrowser shows several teventadc and teventvoltage TTrees, with a digit after it (eg “teventadc;4”)… What does this mean?

These are keys related to how ROOT writes the trees. Liek iterations in writing. I made them disappear in the Python interface, I will also in gtot, because they confuce people (including me ;) ).

Looking foward to discussing this in more details :-) !

I'll look into the rest of bugs. Finally, someone is testing it :)

-- Dr Lech Wiktor Piotrowski Particles and Fundamental Interactions Division Institute of Experimental Physics, University of Warsaw