Open juanangp opened 2 years ago
Hello @juanangp,
I also thought of this in the past, but after trying to refactor them to a structure like what you are suggesting I discovered why they are defined like that.
I don't remember the details but there were some (I thought unsolvable) issues when storing the data in a TTree, it seems it doesn't support such a depth level.
Currently we store a vector of tracks for each event, where each track has a single member which is a hits
(plural). I tried to instead have each track have a vector of hits but as I say ROOT doesn't like it.
I think in principle it may be possible to do thanks to dictionaries, but it will probably present some issues, such as making our data structures unreadable without using dictionaries (which is annoying if you want to analyze this data in python using uproot for example). Currently the event tree is readable without having to load REST libraries thanks to the default max split level on the tree, but it stops working once you "put a vector inside a vector". I also asked for a solution on the ROOT forums but didn't get a convincing answer (basically to just use dictionaries).
So I would recommend to keep the current structure, but I also agree that I can be improved in some aspects.
My suggestions coming from experience implementing these structures from scratch for Geant4 simulation code:
TObject
inheritance: In my experience it only introduces annoying data members such as TBits
(or however its called) and provides nothing useful, it is not necessary to inherit from TObject to save into a ttree. Also you don't expect to save a hits or a track object into a file, there are always inside an event, in this case its enough with the top-most classs inheriting from TObject.TRestTrack
in a TRestEvent
but when you perform a TRestEvent::GetTrack
call it could return a TRestAnalysisTrack
so that you can transparently perform analysis on the terminal?Perhaps @jgalan wants to join the discussion.
Any improvement on this sense is really welcome. This big change will probably break forward/backward compatibility. So we could move towards a new 2.4.X release in that case.
But who knows, perhaps ROOT schema evolution works as expected, to be checked.
I am just afraid of the work load (gitlab pipelines should be kept running and alive). If you want to invest on that part of the code then you could give a try.
Probably @nkx has few insights to decide if moving forwards is a wise decision.
Hi @lobis,
I don't understand when you said that ROOT doesn't like to "put a vector inside a vector", this is what we are actually doing e.g. in TRestTrackEvent:
class TRestTrackEvent : public TRestEvent {
std::vector<TRestTrack> fTrack; // Collection of tracks that define the event
Where TRestTrack:
class TRestTrack : public TObject {
TRestVolumeHits fVolumeHits; ///< Hit volumes that define a track
Then TRestVolumeHits inherits from TRestHit (see above) and also has further vectors:
class TRestVolumeHits : public TRestHits {
protected:
std::vector<Float_t> fSigmaX;
Anyhow, I agree with you that the data structures should be readable outside REST, however it shouldn't be a problem. Looking a bit into ROOT looks like is possible to split a std::vector
On the other hand, I agree with your other suggestions.
In summary, I think we can give a try although I am afraid we will loose backward compatibility of the data.
Hi @lobis,
I don't understand when you said that ROOT doesn't like to "put a vector inside a vector", this is what we are actually doing e.g. in TRestTrackEvent:
This is the post where I asked this question: https://root-forum.cern.ch/t/storing-nested-structure-event-track-step-into-tree/45660/11
I have worded my reply poorly, what I meant was that if we use something like:
class Event:
vector<Track> fTracks;
class Track:
vector<Hit> fHits;
class Hit:
TVector3 fPosition;
...
It will arise some problems (or that was my experience, I don't understand why though, maybe I did something else wrong, it may be worth it to try.
I came to the conclusion that a structure like the following is the one that will work:
class Event:
vector<Track> fTracks;
class Track:
Hits fHits;
class Hits:
vector<TVector3> fPosition;
...
but now that I see it I understand less why I had this problem, they seem equivalent. In my forum post you can see that you cannot see the complete structure from the TBrowser without loading the dictionaries, whereas with the first proposal you can. Maybe just enabling a setting on the TTree when writing (such as the split level) you can make it work... I don't know.
Remove analysis methods from the data structures: I think the data structure definitions should remain as simple as possible. Analysis methods (such as getting the mean position of a track etc.) should reside in a different class. For example you could store a TRestTrack in a TRestEvent but when you perform a TRestEvent::GetTrack call it could return a TRestAnalysisTrack so that you can transparently perform analysis on the terminal?
I don't mind adding simple analysis methods in the data structures. The methods to calculate things like mean position or RMS is very universal. The algorithm will never be updated. Adding these methods directly in the data structure expands its capability. For other advanced analysis like doing Hough transformation they shall indeed be placed in other classes.
The problem for me is that there are too much repeated method in TRestDetectorHits and TRestDetectorHitEvents. For example, TRestDetectorHitsEvent::GetNumberOfHitsY() is just calling TRestDetectorHitsEvent::GetNumberOfHitsY().
One could avoid that by accepting that the user will call detectorEvent->GetHits()->GetNumberOfHitsY()
?
Just removing the methods at TRestDetectorHitsEvent
would force us to write the code that way.
Current code design makes difficult to handle data structures, e.g. in TRestHits.h
Y would suggest to move to a different data structure e.g.
This would simplify the handling of TRestHits using standard std iterators, for instance this part of the code mimic std iterators:
The implementatio of this proposal would imply a lot of changes in the code and perhaps the root files might loose the backward compatibility, but it will be a big improvement that would simplify the code.
Let me know what do you think, I can help with the migration.