Closed www3cam closed 7 years ago
Can you put some simple sample code here that reproduces the error you are seeing? That would help diagnose what is going on.
I'm assuming you have a C compiler... the other usual suspect is a path issue.
The code is like this. The other alternative is that I could just write map for year2_split_xml....splitxml() and pass multiple vectors into map. I think defining another function is making it unpickable:
def masterrun(file1121):
return year2_split_xml_embedded_pftaps_allfiles.splitxml(datename, path1m, path2am, file1121, vectorword, tfmodallfile, savedict)
p = Pool(processes=12)
listofstuff = p.map(masterrun,listoffiles)
p.close()
p.join()
I still have trouble after rewritting the code to avoid masterrun(). The entirety of the multi threading is also inside of another function. I didn't have this problem with an earlier edition of the code that didn't have it inside a function nor had p.map() return anything. What is inside year2_split_xml_embedded_pftaps_allfiles.splitxml() is pretty much the same as the earlier iteration of the code which worked (although I used base multithreading not pathos), but it does contain calls to another module parsexml multiple times. I am happy to copy whats in splitxml() although it may be better in private as its an academic endeavor and its about 150 lines long and not well commented and follow-able.
Also I should say I'm working on a Linux cluster and while I'm not an admin, I have no reason to assume I don't have a C compiler (i.e. was deleted or something)
Also import dill
seems to work
One thing that occurred to me is I might have built (as in used the build command) in wrong order as I didn't use setup.py as this cluster is not connected to internet. I don't know if this would cause a problem.
I can't run your code, and that's not a minimal working example... so I'm hindered in helping you unless I can run the code. The typical thing to do is to post a minimal sample of code that reproduces the error.
What I can say from your snippet above is that masterrun
is going to make serialization harder. First, the variables in the function are not encapsulated, so it a lot of global references. That's not a good idea -- it works best if you either pass or import all variables you need. Second, the function is defined above the pool
. If you are going to do weird stuff like use global references, then it's often better to import the function.
Also, what do you mean that import dill
seems to work? Work how? Does importing dill
resolve your problem?
And, yes, as I was noting in my earlier response... what you are seeing in the error is often caused by not having a C-compiler. When you build, if the install of multiprocess
fails in any way, it defaults back to multiprocessing
... which doesn't use dill
. So, yes, I'd try rebuilding the code.
Hi, check, please, this example:
import random
import dill
dill.detect.trace(True)
# from pathos.pools import _ProcessPool as Pool
from pathos.pools import ProcessPool as Pool
def process(item):
return item.power()
class A(object):
def __init__(self, num):
super(A, self).__init__()
self.num = num
def power(self):
self.num = self.num ** 2
return self.num
pool = Pool(3)
items = [A(random.uniform(1,10)) for x in range(1,3)]
results = pool.map(process, items)
print(results)
It returns cPickle.PicklingError: Can't pickle <type 'function'>: attribute lookup __builtin__.function failed
Pathos: 0.2.1
@Shpionus: That helps. If you don't mind, I've edited your code slightly to include the line dill.detect.trace(True)
, and have it iterate less 1,3
instead of 1,10
. Then I can directly discuss results.
So, this is interesting... it succeeds for python 2.6
, and also for 3.x
. When it fails, it doesn't even hit the dill
trace. So, something must have changed in python 2.7
in a recent release that is messing things up for multiprocess
. I'll look into it.
The traceback is:
Traceback (most recent call last):
File "test_pathos_cpickle.py", line 24, in <module>
results = pool.map(process, items)
File "/Users/mmckerns/lib/python2.7/site-packages/pathos-0.2.2.dev0-py2.7.egg/pathos/multiprocessing.py", line 137, in map
return _pool.map(star(f), zip(*args)) # chunksize
File "/Users/mmckerns/lib/python2.7/site-packages/multiprocess-0.70.6.dev0-py2.7.egg/multiprocess/pool.py", line 251, in map
return self.map_async(func, iterable, chunksize).get()
File "/Users/mmckerns/lib/python2.7/site-packages/multiprocess-0.70.6.dev0-py2.7.egg/multiprocess/pool.py", line 567, in get
raise self._value
cPickle.PicklingError: Can't pickle <type 'function'>: attribute lookup __builtin__.function failed
Ok... hmm... if you use from multiprocess import Pool
directly for python 2.7
, it also works. (see edits to the @Shpionus code above) -- this is the same as using from pathos.pools import _ProcessPool as Pool
.
Looking at the travis history, everything worked less than an month ago, with zero changes... so that further supports that is a recent change to python that's impacting it.
@Shpionus, @www3cam: what minor version of python are you using? I'm seeing the error in 2.7.14
and even now when I rebuild in 2.7.13
. The error is present in some of the existing pathos
tests, and was not present upon the most recent commit.
I'm using python 2.7.11. Error still occurs with this version
I can also confirm that regular multiprocess works as that's what I'm using now.
2.7.11
and 2.7.13
definitely did not produce this error previously. Hmm, it may be a fault of a change in multiprocess
as the conflict. For example, the following works in python 2.6
and 3.6
, but currently does not in 2.7
.
>>> import pathos.pools as pp
>>> pool = pp.ProcessPool()
>>> class Foo(object):
... def bar(self, x):
... return x*x
...
>>> pool.map(Foo().bar, range(4))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/mmckerns/lib/python2.7/site-packages/pathos-0.2.2.dev0-py2.7.egg/pathos/multiprocessing.py", line 137, in map
return _pool.map(star(f), zip(*args)) # chunksize
File "/Users/mmckerns/lib/python2.7/site-packages/multiprocess-0.70.6.dev0-py2.7.egg/multiprocess/pool.py", line 251, in map
return self.map_async(func, iterable, chunksize).get()
File "/Users/mmckerns/lib/python2.7/site-packages/multiprocess-0.70.6.dev0-py2.7.egg/multiprocess/pool.py", line 567, in get
raise self._value
cPickle.PicklingError: Can't pickle <type 'function'>: attribute lookup __builtin__.function failed
This also fails similarly in multiprocess
...
>>> import multiprocess as mp
>>> _pool = mp.Pool()
>>> _pool.map(Foo().bar, range(4))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/mmckerns/lib/python2.7/site-packages/multiprocess-0.70.6.dev0-py2.7.egg/multiprocess/pool.py", line 251, in map
return self.map_async(func, iterable, chunksize).get()
File "/Users/mmckerns/lib/python2.7/site-packages/multiprocess-0.70.6.dev0-py2.7.egg/multiprocess/pool.py", line 567, in get
raise self._value
cPickle.PicklingError: Can't pickle <type 'instancemethod'>: attribute lookup __builtin__.instancemethod failed
Both of the above should not be using cPickle
unless _multiprocess
fails to build. And, again, the above examples work for 2.6
and 3.6
, and used to work in 2.7
. So, I'm going to look at the impact of changes to multiprocess
.
Well, here's the root problem:
>>> import _multiprocess
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ImportError: No module named _multiprocess
>>>
The _multiprocess
shared object builds, but for some reason doesn't install correctly.
Ok... so I just rebuilt multiprocess
with python setup.py build
and then ran python setup.py install --prefix=$HOME
(where $HOME
is on my python path)... and the error I was seeing goes away, and all of the above code works. There must be some incompatibility with different versions of the C _multiprocess
library object, and an update doesn't rebuild it... or something like that. I'll try to diagnose whatever the cases are that allowed me to see the error, and now I don't.
What I did to get it to work is essentially to go to the source and build
and install
separately. Possibly it's bypassing some pip
caching or something like that... but rebuild your multiprocess
library and ensure that you can import _multiprocess
. If that works, everything else should work.
Apparently there seems to be a naming convention change for eggs... so if you have an old build of multiprocess
in the site-packages directory, you can get a conflict with an older installed version. Buy cleaning out the eggs and rebuilding, it seems to solve the problem.
I'm relabeling this... maybe it's not a bug after all.
@www3cam, @Shpionus: Let me know if you experience the same as I do after a clean rebuild. If so, we can close this issue.
@mmckerns Yes, it works.
I removed both multiprocess
and multiprocess-0.70.5-py2.7.egg-info
from site-packages directory. Then pip install multiprocess --no-cache-dir
. After that all works like a charm.
Thank you.
Ok, so no comments in about a month... and I'm going to assume that this was fixed for everyone. Closing this now. Reopen if it's not fixed for you.
No luck. It keeps giving following error: File "/home/artifact_migration.py", line 227, in start_process d1 = r1.get() File "/usr/lib/python2.7/site-packages/multiprocess-0.70.6.dev0-py2.7.egg/multiprocess/pool.py", line 567, in get raise self._value cPickle.PicklingError: Can't pickle <type 'instancemethod'>: attribute lookup builtin.instancemethod failed
My method: pool = Pool(processes=2 if cpu_count() < 2 else cpu_count()) BaseManager.register('AtomicOperation', AtomicOperation) manager = BaseManager() manager.start() am = manager.AtomicOperation() FOR LOOP: r1 = pool.apply_async( self.aql_query, (self.base_url_hq1, c, self.api_hq1,)) r2 = pool.apply_async( self.aql_query, (self.base_url_aws, c, self.api_aws_np,)) d1 = r1.get() d2 = r2.get()
Method aql_query() is just making HTTP Post Request and returns content
@striveLogic : this might not be the same issue. please open a new issue, and post some sample code that reproduces your issue.
Hi, I'm running into a an error where it says multiprocessing cannot cPickle a function. Here is traceback: Traceback (most recent call last): File "/home/rnczf01/Desktop/Files/Patent_sim/newpython/newpython2/patent_master_year2_pftaps_allfiles.py", line 165, in
masterallfilespf(vecw2, filepathloc, path1m11, tfmod123, lsimodel123, ldamodel123, dictinput, filterfiles2, datename)
File "/home/rnczf01/Desktop/Files/Patent_sim/newpython/newpython2/patent_master_year2_pftaps_allfiles.py", line 93, in masterallfilespf
listofstuff = p.map(masterrun,listoffiles)
File "/home/rnczf01/Desktop/Files/pathos-master/pathos/multiprocessing.py", line 137, in map
return _pool.map(star(f), zip(*args)) # chunksize
File "/home/rnczf01/Desktop/Files/multiprocess-master/py2.7/multiprocess/pool.py", line 251, in map
return self.map_async(func, iterable, chunksize).get()
File "/home/rnczf01/Desktop/Files/multiprocess-master/py2.7/multiprocess/pool.py", line 567, in get
raise self._value
cPickle.PicklingError: Can't pickle <type 'function'>: attribute lookup builtin.function failed
For some reason multiprocess is still using cPickle even though I installed dill: echo $PYTHONPATH :ct/bin:/share/apps/tesseract/bin:/share/apps/Python2.7/anaconda2/bin:/share/apps/STATA14:/opt/openmpi/bin:/usr/lib64/qt-3.3/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/opt/ganglia/bin:/opt/ganglia/sbin:/usr/java/latest/bin:/opt/rocks/bin:/opt/rocks/sbin:/home/rnczf01/Desktop/Files/pytesseract-master/pytesseract-master/src:/home/rnczf01/Desktop/Files/gensim-develop:/home/rnczf01/Desktop/Files/smart_open-1.3.5:/home/rnczf01/Desktop/Files/pathos-master:/home/rnczf01/Desktop/Files/pp-1.6.5:/home/rnczf01/Desktop/Files/dill-master:/home/rnczf01/Desktop/Files/multiprocess-master:/home/rnczf01/Desktop/Files/multiprocess-master/py2.7:/home/rnczf01/Desktop/Files/pox-master
Any help would be appreciated. Thanks