root-project / root

The official repository for ROOT: analyzing, storing and visualizing big data, scientifically
https://root.cern
Other
2.53k stars 1.24k forks source link

[DF] Allow defining multiple columns in single RDataFrame::Define() call #14147

Open karuboniru opened 7 months ago

karuboniru commented 7 months ago

Feature description

When I am doing migration of analysis code from previous work to RDataFrame, there can be cases when some code can finish the calculation of multiple variables in a single call. (like calculate various kinematics variables for one reconstructed particle). While I could be using RDataFrame to write code like:

//...
.Define("particle", [](event_recordT& e){...}, {"event_record"})
.Define("KinVar1",[](particleT &p){return p.KinVar1();},{"particle"})
.Define("KinVar2",[](particleT &p){return p.KinVar2();},{"particle"})
//...

, but it would be great if I could do something like

//...
.Define({"KinVar1", "KinVar2"}, [](event_recordT& e){... return std::make_tuple(p.KinVar1(),p.KinVar2()); },  {"event_record"} )
//...

with an overload of Define that looks like: RDataFrame::Define(const ColumnNames_t &, F && , const ColumnNames_t &).

Or to say, is it possible to add another overload that takes a list of names of variables to define, and takes a callable object that returns a tuple. Internally it will unpack the tuple and put all the variables in different columns.

Alternatives considered

Actually my current workaround is to define a column with tuple and unpack it with multiple define, since I am not planning to do so many modification to old code.

Additional context

No response

dpiparo commented 7 months ago

Dear @karuboniru , thanks for sharing this proposal. We will discuss this internally.

martamaja10 commented 5 months ago

See also https://sft.its.cern.ch/jira/browse/ROOT-9766 for more discussion and motivation.