gordonwatts / BDTTrainingAnalysisLanguage

Pull from ATLAS EXOT 15 Derivation, columnar data, and flat rootutples with RDF to scikitlearn in one nice fast swoop
0 stars 2 forks source link

Add a terminal operator: Count() and Max() and Min() in sub-queries #37

Closed gordonwatts closed 5 years ago

gordonwatts commented 5 years ago

Create the infrastructure for doing terminal operators, by implementing Count(), Min(), and Max(). These should be available for use in the rest of the query.

gordonwatts commented 5 years ago

Actually, these are all just examples of aggregate - perhaps short cuts to that. So first thing to do is just implement aggregate.

That said, it may be that sometimes there are short-cuts that can be taken in the backend. If we let Aggregate be the only thing, then the short-cuts can't be detected. Ok - lets call that an optimization that can be put in later.

Example: if you are counting the number of jets and don't make any filter cuts, then ->size() in C++ is all you need, and that is much faster.

gordonwatts commented 5 years ago

Put in place something that does a translate in the top level layer - that make implementation easy. It might remove the "max" and "min" short-cuts in some things, however. We will have to think about that. Ah - we don't always have to run the transformer!

gordonwatts commented 5 years ago

This basically works, but need to rationalize Aggregate, or the arguments to it: