In order to narrow in on possible sources for memory issues, this is a simple approach for reading all input. It strips out coffea, uproot.dask, and dask-awkward. It uses Dask to distribute Python functions that read the data through uproot.open.
This also adds new functionality to track worker count over time and calculate scheduling efficiencies from that.
In order to narrow in on possible sources for memory issues, this is a simple approach for reading all input. It strips out coffea, uproot.dask, and dask-awkward. It uses Dask to distribute Python functions that read the data through
uproot.open
.This also adds new functionality to track worker count over time and calculate scheduling efficiencies from that.
For convenience, also adding fileset filtering.
This is mostly an untested straight port of https://github.com/iris-hep/idap-200gbps-atlas/pull/58. Please test before merging!