Closed mzouink closed 8 months ago
change blockwise processing to using dask
From Rob: The read pattern against NRS is also a bit interesting as it is very distinct and may be worth checking out. There is a drop off at a regular interval that seems odd if you just have a bunch of tasks you are farming out to the workers. I would expect after some amount of time the normal variation in the runtimes of each task between differences in data and differences in the resulting performance of each task due to the variation in cpu and clock speeds would smooth that out. You would expect to see that drop in read to come up and level out with the rest of the time. Now likely explanation would be if each of those plateaus was a collection of tasks that all had to finish before the next collection could be started.
But there is also still a lag on a per worker basis following each computational spike. The idle time is similar in duration to the computational time. That leads to asking is the worker waiting on the next work unit and how is that work unit being dispatched to the worker. Is there a main processes on the lead task that is handling the dispatch of work units to the workers. Is that lead task also waiting for the completion of those tasks and how is it informed of that. Is it getting a message passed via an api or is it somehow monitoring the spawned workers via the output of the bsub commands to submit them. Or is it monitoring the log files generated by the workers. Also is the lead task multithreaded such that it can receive completion notifications from many tasks at once and also be able to dispatch new work units to the workers without serializing down at some point. Given the few second nature we see in the computation on the workers that would seem to indicate that there will be a large amount of task turnover so any serialization or locking could lead to stranding the workers with nothing to do waiting on their next work unit. Also how is the work unit being passed to workers is the lead task connecting to the worker directly to hand it the next work or is it putting it somewhere and waiting on the worker to check for it’s next work.
I think I had asked about if there was a database involved in this workflow previously but I don’t see that there was an answer about that. Once upon a day with some of Caroline’s daisy based workflow she was turning over jobs so quickly and generating new database connections with each operation that she was running nodes out of tcp sockets. I am not seeing that behavior here but that high turnover and bottlenecking down to a single database server can be problematic. In her case she was only running a few dozen workers so I could see it getting even worse a few thousand if they are waiting for database operations to complete.
Switch to Dask. @d-v-b @yuriyzubov
what code would we need to change or look at to achieve this?
and, even better than switching to dask would be to make the workflow generic with respect to task executors, so that someone could use dask or daisy
this is the daisy repo we are using : https://github.com/funkelab/daisy/tree/master/daisy one of the idea is having daisy able to work on top of dask. there is a lot of helpful tools in daisy
Seems due to read_write_conflict. Set to False when possible for improved performance.
@pattonw @funkey
I tested 4 different runs with exactly the same data and the same process:
1- 2260 workers X 1 core : performance 108 blocks/s 2- 1000 workers X 1 core: performance 127 blocks/s 3- 500 workers x 1 core : performance 140 blocks/s 4- 565 worker sX 4 cores : performance 138 blocks/s