Closed gordonwatts closed 6 months ago
Hoping @alexander-held does this for the uproot reading version so we can copy.
You can track the progress of that in #42 as well.
See #50 for the solution implemented for coffea.
Another example is in https://github.com/iris-hep/idap-200gbps/pull/7/files, a similar strategy should work with uproot.dask
for the ServiceX use case.
When we run our small test, locally, DASK says there are 2230 tasks scheduled (this is before doing any of these modifications).
Hmmm - with this new method there are 2228 tasks instead of 2230. That seems like a very small shrinkage!
See https://github.com/dask-contrib/dask-awkward/issues/499#issuecomment-2063241077 for more information. Basically - use axis=1 for each file/sample. This should substantially reduce the number of tasks.