iris-hep / idap-200gbps

MIT License
3 stars 4 forks source link

feat: simple distributed reading with plain uproot.open and worker count tracking #11

Closed alexander-held closed 5 months ago

alexander-held commented 5 months ago

In order to narrow in on possible sources for memory issues, this is a simple approach for reading all input. It strips out coffea, uproot.dask, and dask-awkward. It uses Dask to distribute Python functions that read the data through uproot.open.

This also adds new functionality to track worker count over time and calculate scheduling efficiencies from that.

For convenience, also adding fileset filtering.

This is mostly an untested straight port of https://github.com/iris-hep/idap-200gbps-atlas/pull/58. Please test before merging!