Send the rest of the args to Snapshot using spark backend

vepadulano / PyRDF

Python Library for doing ROOT RDataFrame analysis

https://pyrdf.readthedocs.io/en/latest/

9 stars 7 forks source link

Send the rest of the args to Snapshot using spark backend #88

Closed watson-ij closed 4 years ago

watson-ij commented 4 years ago

Thanks for the interesting project, I have a small issue that I fixed:

Currently, the spark backend doesn't have the ability to store only select branches since it drops the branch vector arguments from Snapshot. This is not so nice when trying to run a big skim, since we end up writing more data than we should. This is a minimal fix for that, by passing the rest of the arguments through to the operation.

etejedor commented 4 years ago

Hi @watson-ij thanks for the fix! I will try to have a look asap, unless @vepadulano or @JavierCVilla can have a quick look.

On the other hand, could you tell us what are you using PyRDF for? Are you using it from SWAN?

watson-ij commented 4 years ago

Hi @etejedor. I have used PyRDF for some histogramming on SWAN, and really found it useful. In this case though, we have a full set of CMS NanoAOD (flat tuples which still need selections and scale factors and so on) on our local cluster which we recently set up with spark. I was seeing how much of our existing slim/skim packages could be replaced and run with RDataFrame and if we could also use spark. I quickly managed to reproduce most of it in a much more straightforward manner, except for the per event slimming part of this issue when running with spark.

vepadulano commented 4 years ago

Hi @watson-ij , thanks for your interest in PyRDF! I will take a look at your fix and get back at you!

vepadulano commented 4 years ago

Hi @watson-ij , I am now at CERN and could take a look at the problem with Travis. It seems like there was a failing test that didn't have to do with your PR. I have corrected the test and merged into master. I have also checked that testing your PR on Travis with the updated master gives no errors. If you still have time, may I ask you to rebase on master and then I could merge your PR? Best, Vincenzo

watson-ij commented 4 years ago

Hi @vepadulano , I rebased to master, and it looks like everything went through this time. Thanks for checking it out!

vepadulano commented 4 years ago

Thank you for your time and collaboration! Best, Vincenzo