iris-hep / idap-200gbps-atlas

benchmarking throughput with PHYSLITE
6 stars 1 forks source link

Run on multiple datasets #90

Closed gordonwatts closed 6 months ago

gordonwatts commented 6 months ago

We should be able to run on more than one dataset

Possible way forward:

alexander-held commented 6 months ago

For grouping files together I have also added some rough categorization in terms of physics in https://github.com/iris-hep/idap-200gbps-atlas/blob/main/input_files/find_containers.py (which the dataset utility uses). The https://github.com/iris-hep/idap-200gbps-atlas/blob/main/materialize_branches.ipynb notebook takes categories as configuration and then also displays the number of files / events / sizes for reference. This grouping might be a convenient way to test different scales as well.