Closed gobbedy closed 6 years ago
Possibly a long shot, but could this issue be related to https://github.com/uqfoundation/dill/issues/56?
If I make a small change to my code it fails with the exact same error:
#!/usr/bin/env python3.6
from time import time
from pathos.multiprocessing import Pool as ThreadPool
import numpy as np
def my_top_function():
#global cls # Comment out this line to make it fail -- DIFFERENT THAN CODE IN ORIGINAL COMMENT
cls = type('NewCls', (object,), dict()) # -- DIFFERENT THAN CODE IN ORIGINAL COMMENT
def my_worker_function(number):
cls.test = 'bla' # some arbitrary work -- DIFFERENT THAN CODE IN ORIGINAL COMMENT
k_list = np.arange(10)
ts = time()
pool = ThreadPool(4)
pool.map(my_worker_function, k_list)
pool.close()
pool.join()
te = time()
print('Finished multiprocess: took %2.4f seconds.' % (te - ts))
my_top_function()
Make cls
global (uncomment global cls
) and it passes again.
I'm not familiar enough with the package's code to be sure, but does this point to an issue in dill?
@mmckerns, could you please give your thoughts on whether the two issues are related?
@gobbedy: I haven't had time to check your code yet, but I will. It's an interesting problem. My quick assessment is that I'm unsurprised. The serialization for items in the globals
, in __main__
, imported from other files, and in locals
... are treated very differently. Some variants (based on where the object is referenced) can see the pickle dramatically increase in size and scope of what it has to serialize (due to references).
I suggest trying your code above, but with dill.settings['recurse'] = True
(or you can directly set recurse=True
on dump
/dumps
). Let me know if that makes a difference. So, the biggest thing to watch for is things that reference globals
, where it ends up forcing a pickle of all of the items in globals. recurse
tries to avoid some of that.
Hi @mmckerns thanks for your quick reply.
I tried adding dill.settings['recurse'] = True
and unfortunately that didn't work. To make sure we're on the same page, the imports for the code in both of my comments above are now the following:
#!/usr/bin/env python3.6
from time import time
import dill # added this
dill.settings['recurse'] = True # added this
from pathos.multiprocessing import Pool as ThreadPool
import numpy as np
This didn't change the behaviour of either testcase. That is, the code in https://github.com/uqfoundation/pathos/issues/145#issue-340060352 still runs slow when global my_array
is commented out, and the code in https://github.com/uqfoundation/pathos/issues/145#issuecomment-404501230 still errors out in the same way.
Note that my code itself doesn't do any pickling. The pickling happens behind the scenes in the pathos.multiprocessing library.
@mmckerns are you saying that when my_array
is global, only the reference is pickled and passed to the workers, whereas when my_array
is local the entire array is pickled?
Because that I could see that making sense. Pickling the entire array would be very expensive.
Sorry if I'm way off the mark. I'm fairly new to pickling/serialization, especially in the context of multiprocessing.
I'm saying something like that, and there are a lot of variants on what dill
will do, depending on the situation. But, yes, the worst case is it pickles the entire array and everything it refers to. If you want to see what's pickled in each case, turn on trace
.
I believe it's: dill.detect.trace(True)
Thanks @mmckerns that makes sense.
I ran the dill.detect.trace(True)
. It's a little too cryptic for me to understand what it prints out. It's also 250+ lines so I won't paste it here.
But I do see numpy-related differences.
For example in the global version I see this (ie no numpy stuff in this particular section):
D2: <dict object at 0x2b2dda364948>
# D2
# F1
B3: <built-in function scalar>
Whereas at the same output location in the local version I see this (a reference to numpy object followed by further differences from the global version):
Ce: <cell at 0x2b71501a42b8: numpy.ndarray object at 0x2b7145869e90>
F2: <function _create_cell at 0x2b714e2f6f28>
# F2
B3: <built-in function _reconstruct>
The first of the snippets say:
starting to pickle a type 'D2' dict
finished pickling type 'D2' dict
finished pickling type 'F1' function
starting to pickle a type 'B3' function 'scalar'
and the second:
starting to pickle a cell 'Ce' pointing at a 'numpy.ndarray'
starting to pickle a type 'F2' function '_create_cell'
finished pickling type 'F2' function
starting to pickle a type 'B3' function '_reconstruct'
So, yes, the latter is pickling the numpy
array.
Hi @mmckerns thanks for your reply.
Any chance you could dig into your hat to find a magic incantation to speed up the code?
As it stands, this issue means that pathos.multiprocessing doesn't support nested worker functions nor being used in objected-oriented code.
To clarify:
1) A nested worker function cannot use its parent's variables (without incurring a dramatic slowdow), rendering it nested in name only.
2) A worker function also cannot use object variables (again without dramatic slowdown) -- something I didn't show here but can do with another small testcase. This renders it incompatible with object orientation.
Well, I disagree. I think if you have large numpy
arrays, then maybe, yes, there's a slowdown in the cases you mention... but if you aren't passing big arrays, then either of those cases work just fine. There are a lot of cases that demonstrate that it is supported, and works well. I'll give you that if you are using a large numpy
array as a local variable, things can get slow if you end up having to serialize and pass the numpy
array around. I'm more than happy to take suggestions on what to do about that.
@mmckerns ah ok, fair enough. I'm coming from scientific computing, where passing large arrays to the workers will often be the modus operandi. So my statement was a bit too broad.
And yes I'd be grateful for suggestions for passing large arrays from parent to worker function. My real code passes large torch arrays rather than numpy, so hopefully I can adapt your suggestions to pytorch.
I totally understand the drivers in scientific computing -- it's my home as well. Anyway, I meant if you have any suggestions on how to not serialize and pass large arrays around... I'd be happy to see what I can do to edit dill
and or pathos
to better support that. Right now, it's one of those things that I'd like to have better supported, but haven't had time to research for a better solution.
Also, please feel free to close this issue when you feel I've sufficiently answered your query.
@mmckerns oh I misread your sentence about suggestions, my bad.
Unfortunately I don't have that kind of expertise. My suggestions would be in the order of, "make it work as if the arrays were global" which is not what you're looking for.
As for closing, I'd like that others to be warned about this so they can learn from my issue. To make it easier to find I'll rename the issue to "pathos.multiprocessing does not support nested and hierarchical maps when distributing large arrays.
Sound good to you?
To be clear I'm not trying to throw shade, I still think this is an awesome library and a major improvement over the core multiprocessing library. I'd just like to help other developers in the future who try what I did.
Change the tittle to whatever you like. Note that large arrays are supported, they are just slow... as you have experienced. Thanks for opening the issue, and inquiring. I'll definitely let you know if a solution to make them faster happens.
Thanks @mmckerns, changed it.
Large arrays are slower than executing the code sequentially, thus defeating the purposing of multiprocessing. I continue to hold that makes them unsupported, but admittedly that's semantics.
Hello,
This code works just as fast as expected (~0.2 seconds)
However, comment out
global my_array
and it slows down by 10x (takes ~2 seconds).My actual code is part of a large class, and using a global would cause all sorts of problems.
Is there any way to avoid having to use global?
Also, why does
my_array
have to be global anyway? Isn'tmy_array
global from the point of view ofmy_worker_function()
?I've been at this for two days and I'd be very grateful for some help.