iris-hep / idap-200gbps-atlas

benchmarking throughput with PHYSLITE
6 stars 1 forks source link

Does S3 output size affect I/O rates? #110

Closed gordonwatts closed 3 months ago

gordonwatts commented 3 months ago

THere is some evidence - can we measure this?

gordonwatts commented 3 months ago

Here is the biggest run:

python servicex/servicex_materialize_branches.py -v --distributed-client scheduler --dask-scheduler 'tcp://dask-gwatts-2e1782e2-0.af-jupyter:8786' --dask-profile --dataset data_50TB --query xaod_all --num-files 0

image

gordonwatts commented 3 months ago

Here is the medium run:

python servicex/servicex_materialize_branches.py -v --distributed-client scheduler --dask-scheduler 'tcp://dask-gwatts-27a6a9bd-9.af-jupyter:8786' --dask-profile --dataset data_50TB --query xaod_medium --num-files 0

image

gordonwatts commented 3 months ago

And the small

And a run with xaod small: python servicex/servicex_materialize_branches.py -v --distributed-client scheduler --dask-scheduler 'tcp://dask-gwatts-27a6a9bd-9.af-jupyter:8786' --dask-profile --dataset data_50TB --query xaod_small --num-files 0

image

gordonwatts commented 3 months ago

So, to summarize:

Run Peak Rate (Gbps) Data Written to S3
small 130 ~500 GB
medium 115 2 TB
large 70 7 TB

In short - there is clear evidence. However, the funny thing is there does not seem to be any reason for it: the total bandwidth available in the switches is much larger than the sum of these two numbers.