Open showkeyjar opened 2 years ago
@showkeyjar: Thanks for making a request. I'm not sure what you mean, can you explain a bit more? At what point in the workflow do you want to change the size of the Pool
? Generally the workflow goes like this:
p = Pool(4)
Build a pool with a fixed number of workersres = p.uimap(lambda x:x*x, range(4))
Execute the jobs on the workerslist(res)
Get the results from the workersp.close(); p.join(); p.clear()
Shut down the poolAt what stage would you want to change the size of the pool, and how exactly?
this is a simple solution example:
p = AutoScalePool(scale_step=1, max_memory=0.8, max_cpu_load=45)
build a pool to estimate number of workersres = p.uimap(lambda x:x*x, range(4))
Execute first batch on the min of workers, and then record the memory and cpu usage. and then at second batch, the workers number = prev + scale_step, if the memory is full then stop increase,and vice versa.so, if the server has other tasks, it can never crashed, and can still execute and maximized use server resource
It's not clear how you envision scale_step=1, max_memory=0.8, max_cpu_load=45
working. Is max_memory
checked after the prior batch or workers has run, and then predicts what will keep memory under 80% and "adjusts" the size of the next batch? If so, I'd expect max_cpu_load
to be similar, but here you have 45
. I also don't understand what the point of scale_step
is... is it that you are allowing the pool size to change by at most scale_step
each new batch?
yes, you are right, it just a shallow plan example, you can design it according to actual conditions.
max_cpu_load
usually no use on the server who has swap, if many process running, it only cause server run slowly.
but max_memory
will have trouble when memory use out on Linux, it'll kill process randomly.
if you can check free memory at every batch, it's also no use for max_memory
I'm very excited that this feature like automatic transmission in car.
I'm glad that pathos is still active, and I have a new advice:
would you please provider a new feature? users can adjust their pool size in runtime.
for example: in jupyter notebook or jupyterlab, when users process data, they must estimate their process number, but it always smaller than it can be, so its waste time to try again and again, if we can have a runtime adjust capability, it‘ll great helpfull for those jobs.