uqfoundation / pathos

parallel graph management and execution in heterogeneous computing
http://pathos.rtfd.io
Other
1.38k stars 89 forks source link

[Request] expose identifier for workers #200

Open holymonson opened 3 years ago

holymonson commented 3 years ago

In multiprocessing, we can use multiprocessing.current_process()._identity to identify worker, and allocate isolated SharedMemory for each worker via initializer().

I wonder if pathos could provide an unified identify for workers in all Pool. So as the initializer parameter (ThreadPool and ProcessPool may have it, but ParallelPool and SerialPool seem not).

mmckerns commented 3 years ago

ThreadPool and ProcessPool do have this option, due to multiprocess -- and you are correct that the other two pools don't. ParallelPool is challenging, as the workers may be on distributed resources. What I tend to do is to pass an additional argument "id" with a unique identifier to the map. It's a workaround, but it enables one to uniquely identify the worker in any pool... and is probably the best approach until there's a mechanism like you mention forProcessPool.

mmckerns commented 3 years ago

So, this is what exists, if you want to work with a ParallelPool and don't want to pass an id

>>> pp = ParallelPool()
>>> s = list(pp.__state__.values())[0]
>>> s._Server__rworkers # 'remote' workers
[]
>>> w = s._Server__workers[0] # local workers
>>> w.pid  # this is the same as multiprocess.current_thread().ident
47242

Then multiprocess.current_thread().ident works for an id for all local workers on any of the pathos pools. I have to look at how to handle remote workers.