Are we writing all the output files to S3 on big dataset runs?

iris-hep / idap-200gbps-atlas

benchmarking throughput with PHYSLITE

6 stars 1 forks source link

Open gordonwatts opened 3 months ago

gordonwatts commented 3 months ago

Attempt to run the same dataset multiple times - make sure the number of files that are saved are the same each time.

Problem: we often see 95% or so of the events that Alex thinks should be there, which makes me worry there is something getting lost here.

gordonwatts commented 3 months ago

In very large datasets, like the 50 TB sample, we sometimes see many fewer output files, which is does not make sense (like 5000 or so).