uqfoundation / pathos

parallel graph management and execution in heterogeneous computing
http://pathos.rtfd.io
Other
1.36k stars 91 forks source link

[Request] Adjust pool size in runtime #243

Open showkeyjar opened 2 years ago

showkeyjar commented 2 years ago

I'm glad that pathos is still active, and I have a new advice:

would you please provider a new feature? users can adjust their pool size in runtime.

for example: in jupyter notebook or jupyterlab, when users process data, they must estimate their process number, but it always smaller than it can be, so its waste time to try again and again, if we can have a runtime adjust capability, it‘ll great helpfull for those jobs.

mmckerns commented 2 years ago

@showkeyjar: Thanks for making a request. I'm not sure what you mean, can you explain a bit more? At what point in the workflow do you want to change the size of the Pool? Generally the workflow goes like this:

  1. p = Pool(4) Build a pool with a fixed number of workers
  2. res = p.uimap(lambda x:x*x, range(4)) Execute the jobs on the workers
  3. list(res) Get the results from the workers
  4. p.close(); p.join(); p.clear() Shut down the pool

At what stage would you want to change the size of the pool, and how exactly?

showkeyjar commented 2 years ago

this is a simple solution example:

  1. p = AutoScalePool(scale_step=1, max_memory=0.8, max_cpu_load=45) build a pool to estimate number of workers
  2. res = p.uimap(lambda x:x*x, range(4)) Execute first batch on the min of workers, and then record the memory and cpu usage. and then at second batch, the workers number = prev + scale_step, if the memory is full then stop increase,and vice versa.

so, if the server has other tasks, it can never crashed, and can still execute and maximized use server resource

mmckerns commented 2 years ago

It's not clear how you envision scale_step=1, max_memory=0.8, max_cpu_load=45 working. Is max_memory checked after the prior batch or workers has run, and then predicts what will keep memory under 80% and "adjusts" the size of the next batch? If so, I'd expect max_cpu_load to be similar, but here you have 45. I also don't understand what the point of scale_step is... is it that you are allowing the pool size to change by at most scale_step each new batch?

showkeyjar commented 2 years ago

yes, you are right, it just a shallow plan example, you can design it according to actual conditions.

max_cpu_load usually no use on the server who has swap, if many process running, it only cause server run slowly.

but max_memory will have trouble when memory use out on Linux, it'll kill process randomly.

if you can check free memory at every batch, it's also no use for max_memory

I'm very excited that this feature like automatic transmission in car.