iris-hep / idap-200gbps-atlas

benchmarking throughput with PHYSLITE
6 stars 1 forks source link

feat: simple distributed reading with plain uproot.open and worker tracking utility #58

Closed alexander-held closed 6 months ago

alexander-held commented 6 months ago

In order to narrow in on possible sources for memory issues, here is a simple approach for reading all input that strips out coffea, uproot.dask, and dask-awkward. This is just Dask distributing Python functions that read the data through uproot.open. It seems to run rather well and is stable in the tests I have done so far.

This PR adds Dask distributing uproot.open, Dask distributing xrdcp, plus new functionality to track worker counts in the background to determine scheduling efficiencies / overhead.