iris-hep / idap-200gbps-atlas

benchmarking throughput with PHYSLITE
6 stars 1 forks source link

Save preprocessing output #40

Closed alexander-held closed 4 months ago

alexander-held commented 5 months ago

Serialize to json, look into dataset tools in coffea: slice files / slice chunks, look into https://coffeateam.github.io/coffea/modules/coffea.dataset_tools.html

alexander-held commented 5 months ago

Making this a target for week 4, this is going to matter more and more as we scale beyond a few thousand files as we will end up saving large amounts of time. Reasonably high priority at that stage.

gordonwatts commented 4 months ago

I know a huge amount of work was done here. What is the status? Should we move this to week 5?

alexander-held commented 4 months ago

Not done as priorities have shifted a bit and the approach in #58 does not include a pre-processing step by design. When we move back to more Dask and coffea, this needs to still be done.

alexander-held commented 4 months ago

This hasn't been a priority as more work was done with uproot.open, shifting to week 6.