miranov25 / RootInteractive

5 stars 12 forks source link

RDataFrame simplifications - numpy/python syntax #297

Open miranov25 opened 1 year ago

miranov25 commented 1 year ago

https://gitter.im/matrix/5ba1f93bd73408ce4fa8a265/@agoose77:matrix.org?at=639f83faa151003b5a7550f4

Possible simplification the creation of RDataFrame function definitions As for simplifying the generation of RDataFrame templates, @pl0xz0rz has implemented something similar in RootInteractive for Python -> javascript with ast. We have replaced the python functions with corresponding javascript functions.

E.g.:
In [136]: ast.dump(ast.parse("track.GetP() / mass", mode="eval"),True,False)
Out [136]: "Expression(body=BinOp(left=Call(func=Attribute(value=Name(id='track', ctx=Load()), attr='GetP', ctx=Load()), args=[], keywords=[]), op=Div(), right=Name(id='mass', ctx=Load())))"
In [140]: ast.dump(ast.parse("track[1:10,x:y].GetP() / mass", mode="eval"),True,False)
Out[140]: "Expression(body=BinOp(left=Call(func=Attribute(value=Subscript(value=Name(id='track', ctx=Load()), slice=ExtSlice(dims=[Slice(lower=Constant(value=1, kind=None), upper=Constant(value=10, kind=None), step=None), Slice(lower=Name(id='x', ctx=Load()), upper=Name(id='y', ctx=Load()), step=None)]), ctx=Load()), attr='GetP', ctx=Load()), args=[], keywords=[]), op=Div(), right=Name(id='mass', ctx=Load())))"
miranov25 commented 1 year ago

Test code updated. First test passing:

In [10]:     rdf2 = makeDefine("arrayD","array1D0[1:10]-array1D2[:20:2]", rdf,3, True)
====================================
arrayD
 array1D0[1:10]-array1D2[:20:2]
====================================

Implementation:

auto arrayD(){
    RVec<double> result(9);
    for(size_t i=0; i<9; i++){
        result[i] = (array1D0[1+i*1]) - (array1D2[0+i*2]);
    }
    return result;
}

Dependencies
 ['array1D0', 'array1D2']
miranov25 commented 1 year ago

Failing tests:

   # rdf2 = makeDefine("arrayD","cos(array1D0[1:10])", rdf,3, True)               # TODO Failing - cos is fund as an obect - not function
    # rdf2 = makeDefine("arrayD","array1DTrack[1:10].Px()", rdf,3, True)           # TODO Failing - member function not found
miranov25 commented 1 year ago

Range checks 1D:

miranov25 commented 1 year ago

https://github.com/miranov25/RootInteractive/commit/fd829c5872e6d2d53be2c7b1ed29fd8117ea7b63

New test 1D and 2D + class methods

https://github.com/miranov25/RootInteractive/blob/fd829c5872e6d2d53be2c7b1ed29fd8117ea7b63/RootInteractive/Tools/RDataFrame/test_RDataFrame_Array.py#L87-L117

Test output:

--------                -----
Columns in total           14
Columns from defines       14
Event loops run             8
Processing slots            1

Column          Type                                            Origin
------          ----                                            ------
array1D0        ROOT::VecOps::RVec<float>                       Define
array1D2        ROOT::VecOps::RVec<float>                       Define
array1DTrack    ROOT::VecOps::RVec<TParticle>                   Define
array2D0        ROOT::VecOps::RVec<ROOT::VecOps::RVec<float> >  Define
array2D1        ROOT::VecOps::RVec<ROOT::VecOps::RVec<float> >  Define
arrayCos0       ROOT::VecOps::RVec<double>                      Define
arrayCosAll     ROOT::VecOps::RVec<double>                      Define
arrayD0         ROOT::VecOps::RVec<float>                       Define
arrayD1D2D      ROOT::VecOps::RVec<ROOT::VecOps::RVec<float> >  Define
arrayD2D        ROOT::VecOps::RVec<ROOT::VecOps::RVec<float> >  Define
arrayDAall      ROOT::VecOps::RVec<float>                       Define
arrayPx         ROOT::VecOps::RVec<double>                      Define
nPoints         int                                             Define
nPoints2        int                                             Define
miranov25 commented 1 year ago

Todo:

  1. Add C++ interface
  2. Invariance test
  3. Protection against invalid inputs
miranov25 commented 1 year ago

Failing test - diagnostic

In [8]:     rdf=makeDefine("arrayD2D","array2D0[:][:]>0", rdf,3, True);
ROOT::VecOps::RVec<float>  ROOT::VecOps::RVec<float> 
float ('f', 32)
====================================
arrayD2D
 array2D0[:][:]>0
====================================

Implementation:
 ROOT::VecOps::RVec<char> arrayD2D(ROOT::VecOps::RVec<ROOT::VecOps::RVec<float> > &array2D0){
    ROOT::VecOps::RVec<char> result(array2D0[0+i*1].size() - 0);
    for(size_t i=0; i<array2D0[0+i*1].size() - 0; i++){
        result[i] = ((array2D0[0+i*1][0+i*1]) > (0));
    }

    return result;
} 
Dependencies
 ['array2D0']
miranov25 commented 1 year ago

Test 7 failed in case we did not make a dictionary for the 2D array of boolean:

    # test 7   - 2D boolen test
    parsed=makeDefine("arrayD20Bool","array2D0[:,:]>0", rdf,3, True);
    rdf = makeDefineRDF("arrayD20Bool", parsed["name"], parsed,  rdf, verbose=1)
    rdf.Snapshot("makeTestRDataFrame","makeTestRDataFrameArrayD1D2D.root");
miranov25 commented 11 months ago

Problem for templated classes

After fix withaccess to class method in commit above, still some problems observed

In [9]: getClassMethod("o2::tpc::TrackTPC","getAlpha")
Out[9]:
('float o2::track::TrackParametrization<float>::getAlpha()',
 'float o2::track::TrackParametrization<float>::getAlpha()')

==>

File ~/github/RootInteractive/RootInteractive/Tools/RDataFrame/RDataFrame_Array.py:74, in scalar_type_str(dtype)
     63 def scalar_type_str(dtype):
     64     dtypes = {
     65         ('f', 32): "float",
     66         ('f', 64): "double",
   (...)
     72         ('i', 8): "char"
     73     }
---> 74     return dtypes[dtype]

KeyError: 'float o2::track::TrackParametrization<float>::getAlpha()'
miranov25 commented 11 months ago

Problem to find if method exist

In [24]: getClassMethod("""o2.track.TrackParametrization""","getAlpha")
Non supported o2.track.TrackParametrization.getAlpha
Out[24]: ('', '')

In [25]: getClassMethod("""o2.track.TrackParametrization<float>""","getAlpha")
Non supported o2.track.TrackParametrization<float>.getAlpha
Out[25]: ('', '')
In [28]: getClassMethod("""o2.track.TrackParametrization("float")""","getAlpha")
Out[28]:
('float o2::track::TrackParametrization<float>::getAlpha()',
 'float o2::track::TrackParametrization<float>::getAlpha()')
miranov25 commented 11 months ago

Template arguments to be replaced

-> ("xxx") ``` In [29]: ROOT.o2.track.TrackParametrization("float").getAlpha Out[29]: ``` after patch: ``` className2=className.replace("::",".") className2 =className2.replace("<", '("') className2 =className2.replace(">", '")') ``` ``` In [2]: getClassMethod("""o2::track::TrackParametrization""","getAlpha") Out[2]: ('float o2::track::TrackParametrization::getAlpha()', 'float o2::track::TrackParametrization::getAlpha()') ```
miranov25 commented 11 months ago

The getClassmetheod was not finished, arguments were ignored for a moment

The code crash if more than one function is returning. Example:

miranov25 commented 11 months ago

Remaining problems in ROOT dicitionary

miranov25 commented 10 months ago

Automatic template function generation consideration - Error handling to define

In case the function is generated, in the second time the generation fails, because the function is already in the scope.

What should be error handling?

miranov25 commented 10 months ago

Problem with ROOT.EnableImplicitMT(nCores) in ROOT + automatic C++ code generation

The problem looks to be random

To simplify the debugging and to make the code faster, it is preferable as an option to save the code in C++ and make a shared library from there.

miranov25 commented 10 months ago

Using precomiled C++ macro the problem wit the EnableImplicitMT(nCores) dissapeared

miranov25 commented 10 months ago

Problem in the function generation to be checked

miranov25 commented 10 months ago

To parse the function- python func-doc can be used

Similarly already done for classes

in example above:

In [12]: ROOT.sqrt.func_doc
Out[12]: 'long double ::sqrt(long double __x)\nfloat ::sqrt(float __x)\ndouble ::sqrt(double __x)'

In [13]: ROOT.TMath.Sqrt.func_doc
Out[13]: 'double TMath::Sqrt(double x)'
miranov25 commented 10 months ago

C++ Namespace function support e.g TMath::<>

    parsed = makeDefine("array2D0_cos0", "cos(array2D0[0:0,:])", rdf, None, 3); # this is working
    parsed = makeDefine("array2D0_cos1", "TMath::Cos(array2D0[0:0,:])", rdf, None, 3); # this is failing
miranov25 commented 10 months ago

AST Support for the slice with dimensionality reduction