Closed mattiasmar closed 1 year ago
What version of python
, dill
, multiprocess
, pathos
, pytorch
, and any other relevant dependencies are you using? This may potentially be resolved by updating to the master
version of dill
... or by using a serialization variant (i.e. changing dill.settings
). You could also find out what the serialization method used in torch.multiprocessing
is (or extract the relevant function from the pickle registry), and then register
it to dill
. It'd also be useful to see the entire traceback. It would also be useful if you posted some minimal example code that reproduced your error.
Versions:
Python 3.9.7
dill 0.3.4
pathos 0.2.8
multiprocess 0.70.12.2
pytorch 1.12.0 py3.9_cpu_0
I don't have a minimal example, however I can tell that when I use
pool = multiprocessing.get_context('spawn').Pool(args.num_workers)
my code runs smoothly, but when I use
pool = pathos.pools.ProcessPool(args.num_workers)
I get a pickle error:
<very long stack trace> .... _pickle.PicklingError: Can't pickle <built-in method tanh of type object at 0x7f401db9fda0>: it's not found as torch._VariableFunctionsClass.tanh
Also, if I remove the PyTorch model that the subprocesses would have access to, pathos.pools.ProcessPool does not fail.
On unix systems, the default start method is fork
(also for MacOS, in contrast to stdlib). Keeping that in mind, pytorch.multiprocessing
does not use multiprocess/dill and documents spawn
only. I seem to remember them even firing a warning or error about using fork in the past, but I think that was when my subprocesses were trying to talk to CUDA in parallel, something they explicitly discourage for fork.
You could try multiprocess.set_start_method('spawn')
, which will be slower due to pickling but probably more stable for PyTorch. Starting with https://github.com/uqfoundation/pathos/pull/252, you can also explicitly pass it like pathos.ProcessPool(4, context=multiprocess.get_context('spawn'))
.
I'm closing this as answered. Please reopen if you feel there's more to discuss.
Hello,
Does Pathos'
ProcessPool
support PyTorch tensors on CPU? If not does any other class of Pathos support it or can I combine Pathos withtorch.multiprocessing
?In the meantime I got this error when calling
map
ofProcessPool
while sending a class that contains PyTorch tensors.