quasiben / dask-scheduler-performance

BSD 3-Clause "New" or "Revised" License
2 stars 5 forks source link

20 iterations with 4 workers on dev with environment level profiling (worker stealing off) #19

Open jakirkham opened 4 years ago

jakirkham commented 4 years ago

Using the same strategy described in issue ( https://github.com/quasiben/dask-scheduler-performance/issues/5 ) with 4 workers and these commits ( https://github.com/dask/dask/commit/ff3ea6c74e8736a5823d648defcf9164f083d266 ) and ( https://github.com/dask/distributed/commit/7e2fb2ff6a8b19487be5d8b9ba086e1560a75ed5 ). Also uses the change in PR ( https://github.com/quasiben/dask-scheduler-performance/pull/14 ) and disables work stealing. Generated the following profiles (included in this archive). Below is a PNG generated from the scheduler. The archive contains SVGs to make it easier to explore (GitHub didn't want to embed that here though).

Scheduler: ![prof_36743 pstat]( https://user-images.githubusercontent.com/3019665/99738855-4fec5180-2a80-11eb-8447-581ae64ecae1.png )
Client: ![prof_36834 pstat]( https://user-images.githubusercontent.com/3019665/99738903-6c888980-2a80-11eb-939c-1db2adf8f3e2.png )
Representative Worker: ![prof_36759 pstat]( https://user-images.githubusercontent.com/3019665/99738950-8a55ee80-2a80-11eb-8842-6c08cc0414e4.png )
jakirkham commented 4 years ago

Looks like a good chunk of time is being spent calling write to send messages, which is a little surprising since this is just on my laptop. Found some inefficiencies in how messages were being built. Submitted PR ( https://github.com/dask/distributed/pull/4257 ) to address that.

jakirkham commented 4 years ago

Here's a close up on the transitions call graph. This cuts out all other nodes that don't in some way get called by transitions be that directly or indirectly.

prof_36743_transitions pstat

jakirkham commented 3 years ago

The __hash__ call in decide_worker comes from WorkerState, which is computing the hash(...) of a str internally. There's no need to keep recomputing this as the str is simply the address of the corresponding Worker. So shouldn't be changing. Submitted PR ( https://github.com/dask/distributed/pull/4271 ) to compute this once on WorkerState construction and then reuse that value when __hash__(...) gets called.