quasiben / dask-scheduler-performance

BSD 3-Clause "New" or "Revised" License
2 stars 5 forks source link

DGX Nightly Benchmark run 20210203 #100

Open quasiben opened 3 years ago

quasiben commented 3 years ago

Benchmark history

<img width="641" alt="Benchmark Image" src="https://raw.githubusercontent.com/quasiben/dask-scheduler-performance/benchmark-images/assets/dgx-20210203-benchmark-history.png">

Raw Data

<Client: 'tcp://127.0.0.1:35623' processes=10 threads=10, memory=540.94 GB> Distributed Version: 2021.01.1+11.g98570fbc simple 4.974e-01 +/- 3.977e-02 shuffle 2.216e+01 +/- 1.352e+00 rand_access 9.311e-03 +/- 2.619e-03 anom_mean 9.924e+01 +/- 1.014e+00

Raw Values

simple [0.53260398 0.46902156 0.4675889 0.48769474 0.53614354 0.48331046 0.41404414 0.50106645 0.52936745 0.55315757] shuffle [20.53589797 21.31255317 21.23172545 21.40167165 21.4829843 25.38235354 21.39798117 22.7743032 22.86513638 23.25193882] rand_access [0.0081172 0.01104164 0.01037145 0.01042962 0.00955176 0.01108932 0.01066732 0.0126369 0.00378251 0.00542545] anom_mean [ 97.08908796 98.83965445 98.77929091 98.8152504 99.15340686 100.42557287 98.87805986 100.9750371 99.94019437 99.47358441]

Dask Profiles

Scheduler Execution Graph

<img width="641" alt="Sched Graph Image" src="https://raw.githubusercontent.com/quasiben/dask-scheduler-performance/benchmark-images/assets/20210203-sched-graph.png">

jakirkham commented 3 years ago

What was transition is now effectively _transition. A small amount of code still remains in transition. That function has seen an effective ~14% improvement in runtime due to batching communication ( https://github.com/dask/distributed/pull/4451 ) at the end of all proposed transitions in the Scheduler graph.

Also notice we called worker_send previously. We now call a function named send_all, which sends messages to all clients and workers in one go. Note that unlike worker_send, which showed up in the call graph, we do not see send_all. Meaning that it takes an insignificant amount of time (less than 0.5% of the overall runtime).

Should add we see a modest drop in time spent in write as well. Though that is still dominated by time spent serializing messages, which we are working to improve independently.

cc @quasiben @mrocklin