Closed jokasimr closed 3 weeks ago
Fixes #35
Note that this changes the way Chexus works in a significant way. This change makes Chexus read every dataset in the file and store them in memory while validating.
I think the better solution would be if the validators could decide to read data or not on a case by case basis, to limit the total amount read and to limit the max amount stored in memory.
But that would require some larger changes to the way chexus works.
* Should we do those changes now or wait until this actually becomes a problem in practice? * Do we want those changes in this PR or in a separate PR?
I think this is going to be a big performance problem in practice. Why not just load all dataset values that are in an allow-list, which for now is maintained by hand?
Why not just load all dataset values that are in an allow-list, which for now is maintained by hand?
That's an interesting solution that I didn't consider.
I just made some changes that lets us read data on-demand in the validators.
It's a bit more complicated than the allow-list, but it has the advantage that in the multiple-detector case we don't have to keep all event_id
s from all detectors in memory simultaneously. We also don't have to keep a list of rules for what fields should be read.
Do you prefer the allow-list or the load-on-demand version?
Why not just load all dataset values that are in an allow-list, which for now is maintained by hand?
That's an interesting solution that I didn't consider. I just made some changes that lets us read data on-demand in the validators. It's a bit more complicated than the allow-list, but it has the advantage that in the multiple-detector case we don't have to keep all
event_id
s from all detectors in memory simultaneously. We also don't have to keep a list of rules for what fields should be read.Do you prefer the allow-list or the load-on-demand version?
Not really, I am fine with either.
Fixes #35
Note that this changes the way Chexus works in a significant way. ~This change makes Chexus read every dataset in the file and store them in memory while validating.
I think the better solution would be if the validators could decide to read data or not on a case by case basis, to limit the total amount read and to limit the max amount stored in memory.
But that would require some larger changes to the way chexus works.