Closed watson-ij closed 4 years ago
Hi @watson-ij thanks for the fix! I will try to have a look asap, unless @vepadulano or @JavierCVilla can have a quick look.
On the other hand, could you tell us what are you using PyRDF for? Are you using it from SWAN?
Hi @etejedor. I have used PyRDF for some histogramming on SWAN, and really found it useful. In this case though, we have a full set of CMS NanoAOD (flat tuples which still need selections and scale factors and so on) on our local cluster which we recently set up with spark. I was seeing how much of our existing slim/skim packages could be replaced and run with RDataFrame and if we could also use spark. I quickly managed to reproduce most of it in a much more straightforward manner, except for the per event slimming part of this issue when running with spark.
Hi @watson-ij , thanks for your interest in PyRDF! I will take a look at your fix and get back at you!
Hi @watson-ij , I am now at CERN and could take a look at the problem with Travis. It seems like there was a failing test that didn't have to do with your PR. I have corrected the test and merged into master. I have also checked that testing your PR on Travis with the updated master gives no errors. If you still have time, may I ask you to rebase on master and then I could merge your PR? Best, Vincenzo
Hi @vepadulano , I rebased to master, and it looks like everything went through this time. Thanks for checking it out!
Thank you for your time and collaboration! Best, Vincenzo
Thanks for the interesting project, I have a small issue that I fixed:
Currently, the spark backend doesn't have the ability to store only select branches since it drops the branch vector arguments from Snapshot. This is not so nice when trying to run a big skim, since we end up writing more data than we should. This is a minimal fix for that, by passing the rest of the arguments through to the operation.