Open alexander-held opened 1 year ago
I just tested: the difference is simply the number of baskets in those files. The NanoAOD has 251 baskets per branch, and the ntuple has 10. Therefore you very quickly hit the 1024-byte-range XRootD limit for the NanoAOD but not for the ntuple.
ServiceX transforms of NanoAOD files and direct uproot-based access via http seem to be slower than for ntuples: https://gist.github.com/alexander-held/4e58811522ed9990afb2d4b73ef9471e.
@masonproffitt pointed out an XRootD issue related to this: https://github.com/xrootd/xrootd/issues/1976. Reading too much data causes a 500 error and
uproot
subsequently falls back to individual requests, making everything slower. A similar issue is https://github.com/xrootd/xrootd/issues/2003: this is about requesting too many ranges at once, while the former is about requesting too many bytes in a range.Related
uproot
issue during these investigations: https://github.com/scikit-hep/uproot5/issues/881.Impact on ServiceX
More details about the behavior of ServiceX from @masonproffitt:
Impact on coffea
It is currently unclear if this would affect
coffea
directly ingesting the input dataset differently. Are there any tricks that may matter here @nsmith- @lgray? Currently we are still using "old" coffea, though preparing to switch to coffea 2023.