gordonwatts / BDTTrainingAnalysisLanguage

Pull from ATLAS EXOT 15 Derivation, columnar data, and flat rootutples with RDF to scikitlearn in one nice fast swoop
0 stars 2 forks source link

Enable more return types #55

Closed gordonwatts closed 5 years ago

gordonwatts commented 5 years ago

While working on #52 it became obvious that restricting ourselves to one type of return - a pandas df in xAOD's backend, is not going to cut it. Specifically, when jet pt lists are allowed in, and they are different length arrays, the conversion to df fails. As it should.

So - we need to implement some sort of converter that will properly hand off data back to the local user, and properly "interpret" the data that is handed back. This is going to touch the front end, the back end, and the middle end.

gordonwatts commented 5 years ago

@masonproffitt - I see you've added awkward array as a return type (and numpy array). This infrastructure I want to put in is going to affect that code. Perhaps we should chat about how best to make it work.

gordonwatts commented 5 years ago

Ok - I've made the changes. It wasn't as bad as I thought it would be. And I think there is a nice "break" here that will enable us to separate the front and back ends by the network when we need to.

@masonproffitt - this will probably hit your code the hardest. Can you take a look? The basic thing I've done is changed on the front end how different return types are indicated. There is now an "ast" node that is generated. And it is up to the backend to do the translation. If you look at my executor you'll see I have a plug-in architecture for that now (for example, at some point we'll probably want histograms, and even count's perhaps? Who knows?).

When I do the merge I think it will break your code. :-) It will likely break your code too, @etorro, but since you aren't as far along I don't think the break will be as bad.

masonproffitt commented 5 years ago

Are you sure you haven't broken your own code as well? I get:

BDTTrainingAnalysisLanguage/clientlib/ObjectStream.py in <module>
      2 import clientlib.query_ast as query_ast
      3 from clientlib.query_result_asts import resultTTree, resultPandasDF, resultAwkwardArray
----> 4 import clientlib.pandas_df_ast as pandas_df_ast
      5 from clientlib.find_LINQ_operators import parse_ast
      6 import ast

ModuleNotFoundError: No module named 'clientlib.pandas_df_ast'

which should not let anything run as far as I can tell...

masonproffitt commented 5 years ago

Yeah, the code that you removed was definitely not dead. If I reset that last commit, my code still runs fine.

masonproffitt commented 5 years ago

Okay, it was literally just that one import line. Fixed: 2b892041412e9d2ef54b3c9c4d6478c43f2ed373. Everything here is good from my side now.

gordonwatts commented 5 years ago

Ah, thanks! This is the problem of running without tests. Something that desperately needs to get fixed. :-)

gordonwatts commented 5 years ago

Ok, going to create a pull request then.