Closed gordonwatts closed 6 months ago
For grouping files together I have also added some rough categorization in terms of physics in https://github.com/iris-hep/idap-200gbps-atlas/blob/main/input_files/find_containers.py (which the dataset utility uses). The https://github.com/iris-hep/idap-200gbps-atlas/blob/main/materialize_branches.ipynb notebook takes categories as configuration and then also displays the number of files / events / sizes for reference. This grouping might be a convenient way to test different scales as well.
We should be able to run on more than one dataset
Possible way forward:
--dataset
flag likeall
, and some smaller ones - like 10 less than 1 TB data samples. Something that will allow us to see what SX does at a smaller scale.dask
cluster at once.