iris-hep / idap-200gbps-atlas

benchmarking throughput with PHYSLITE
6 stars 1 forks source link

Compare different size DASK clusters performance #117

Closed gordonwatts closed 3 months ago

gordonwatts commented 3 months ago

Try running a 1 TB test to see how different size DASK clusters affect things.

(venv) [bash][gwatts]:idap-200gbps-atlas > python servicex/servicex_materialize_branches.py -v --distributed-client scheduler --dask-scheduler 'tcp://dask-gwatts-2e1782e2-0.af-jupyter:8786' --dask-profile --dataset mc_1TB --query xaod_all --num-files 0
gordonwatts commented 3 months ago

Here is with 100 initial nodes allocated.

0571.2980 - INFO - root - Dataset speed_test_mc20_13TeV:mc20_13TeV.364157.Sherpa_221_NNPDF30NNLO_Wmunu_MAXHTPTV0_70_CFilterBVeto.deriv.DAOD_PHYSLITE.e5340_s3681_r has 1165 files
0580.0327 - INFO - root - Number of skimmed events: 130668000 (skim percent: 95.7569%)
0580.1648 - INFO - root - Using `uproot.dask` to open files (splitting files 2 ways).
0580.1649 - INFO - root - Starting build of DASK graphs
0580.6522 - INFO - root - Computing the total count
0635.8339 - INFO - root - Event rate for DASK Calculation: 00:00:55 time, 2472.90 kHz, Data rate: 177.16 Gbits/s
0635.8340 - INFO - root - DASK event rate over actual events: 2367.98 kHz
0635.8341 - INFO - root - speed_test_mc20_13TeV:mc20_13TeV.364157.Sherpa_221_NNPDF30NNLO_Wmunu_MAXHTPTV0_70_CFilterBVeto.deriv.DAOD_PHYSLITE.e5340_s3681_r: result = 130,668,000

image

gordonwatts commented 3 months ago

Here is with 500 pre-allocated:

0000.7676 - INFO - root - Dataset speed_test_mc20_13TeV:mc20_13TeV.364157.Sherpa_221_NNPDF30NNLO_Wmunu_MAXHTPTV0_70_CFilterBVeto.deriv.DAOD_PHYSLITE.e5340_s3681_r has 1165 files
0010.1967 - INFO - root - Number of skimmed events: 130668000 (skim percent: 95.7569%)
0010.3275 - INFO - root - Using `uproot.dask` to open files (splitting files 2 ways).
0010.3277 - INFO - root - Starting build of DASK graphs
0010.8360 - INFO - root - Computing the total count
0079.4957 - INFO - root - Event rate for DASK Calculation: 00:01:08 time, 1987.47 kHz, Data rate: 142.38 Gbits/s
0079.4960 - INFO - root - DASK event rate over actual events: 1903.14 kHz
0079.4961 - INFO - root - speed_test_mc20_13TeV:mc20_13TeV.364157.Sherpa_221_NNPDF30NNLO_Wmunu_MAXHTPTV0_70_CFilterBVeto.deriv.DAOD_PHYSLITE.e5340_s3681_r: result = 130,668,000
Duration: 65.57 s
Tasks Information
number of tasks: 7660
compute time: 5hr 10m
transfer time: 117.95 s

image

I do not know how to explain that very odd gap in there!

gordonwatts commented 3 months ago

Closer inspection - all those way out there are the reports! We need to turn those off!

gordonwatts commented 3 months ago

Here is a better run:

0000.8537 - INFO - root - Dataset speed_test_mc20_13TeV:mc20_13TeV.364157.Sherpa_221_NNPDF30NNLO_Wmunu_MAXHTPTV0_70_CFilterBVeto.deriv.DAOD_PHYSLITE.e5340_s3681_r has 1165 files
0010.5905 - INFO - root - Number of skimmed events: 130668000 (skim percent: 95.7569%)
0010.7230 - INFO - root - Using `uproot.dask` to open files (splitting files 2 ways).
0010.7230 - INFO - root - Starting build of DASK graphs
0011.2111 - INFO - root - Computing the total count
0060.8917 - INFO - root - Event rate for DASK Calculation: 00:00:49 time, 2746.73 kHz, Data rate: 196.78 Gbits/s
0060.8919 - INFO - root - DASK event rate over actual events: 2630.18 kHz
0060.8919 - INFO - root - speed_test_mc20_13TeV:mc20_13TeV.364157.Sherpa_221_NNPDF30NNLO_Wmunu_MAXHTPTV0_70_CFilterBVeto.deriv.DAOD_PHYSLITE.e5340_s3681_r: result = 130,668,000
Duration: 46.72 s
Tasks Information
number of tasks: 7660
compute time: 3hr 46m
disk-write time: 11.06 ms
transfer time: 116.81 s

image

gordonwatts commented 3 months ago

And with zero pre-allocated:

0000.7298 - INFO - root - Dataset speed_test_mc20_13TeV:mc20_13TeV.364157.Sherpa_221_NNPDF30NNLO_Wmunu_MAXHTPTV0_70_CFilterBVeto.deriv.DAOD_PHYSLITE.e5340_s3681_r has 1165 files
0020.1129 - INFO - root - Number of skimmed events: 130668000 (skim percent: 95.7569%)
0020.2425 - INFO - root - Using `uproot.dask` to open files (splitting files 2 ways).
0020.2425 - INFO - root - Starting build of DASK graphs
0020.8817 - INFO - root - Computing the total count
0101.6934 - INFO - root - Event rate for DASK Calculation: 00:01:20 time, 1688.60 kHz, Data rate: 120.97 Gbits/s
0101.6935 - INFO - root - DASK event rate over actual events: 1616.95 kHz
0101.6936 - INFO - root - speed_test_mc20_13TeV:mc20_13TeV.364157.Sherpa_221_NNPDF30NNLO_Wmunu_MAXHTPTV0_70_CFilterBVeto.deriv.DAOD_PHYSLITE.e5340_s3681_r: result = 130,668,000

Duration: 79.35 s
Tasks Information
number of tasks: 7660
compute time: 4hr 2m
transfer time: 102.84 s

image