uqfoundation / pathos

parallel graph management and execution in heterogeneous computing
http://pathos.rtfd.io
Other
1.39k stars 89 forks source link

Still using pickle even though I have dill #121

Closed www3cam closed 7 years ago

www3cam commented 7 years ago

Hi, I'm running into a an error where it says multiprocessing cannot cPickle a function. Here is traceback: Traceback (most recent call last): File "/home/rnczf01/Desktop/Files/Patent_sim/newpython/newpython2/patent_master_year2_pftaps_allfiles.py", line 165, in masterallfilespf(vecw2, filepathloc, path1m11, tfmod123, lsimodel123, ldamodel123, dictinput, filterfiles2, datename) File "/home/rnczf01/Desktop/Files/Patent_sim/newpython/newpython2/patent_master_year2_pftaps_allfiles.py", line 93, in masterallfilespf listofstuff = p.map(masterrun,listoffiles) File "/home/rnczf01/Desktop/Files/pathos-master/pathos/multiprocessing.py", line 137, in map return _pool.map(star(f), zip(*args)) # chunksize File "/home/rnczf01/Desktop/Files/multiprocess-master/py2.7/multiprocess/pool.py", line 251, in map return self.map_async(func, iterable, chunksize).get() File "/home/rnczf01/Desktop/Files/multiprocess-master/py2.7/multiprocess/pool.py", line 567, in get raise self._value cPickle.PicklingError: Can't pickle <type 'function'>: attribute lookup builtin.function failed

For some reason multiprocess is still using cPickle even though I installed dill: echo $PYTHONPATH :ct/bin:/share/apps/tesseract/bin:/share/apps/Python2.7/anaconda2/bin:/share/apps/STATA14:/opt/openmpi/bin:/usr/lib64/qt-3.3/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/opt/ganglia/bin:/opt/ganglia/sbin:/usr/java/latest/bin:/opt/rocks/bin:/opt/rocks/sbin:/home/rnczf01/Desktop/Files/pytesseract-master/pytesseract-master/src:/home/rnczf01/Desktop/Files/gensim-develop:/home/rnczf01/Desktop/Files/smart_open-1.3.5:/home/rnczf01/Desktop/Files/pathos-master:/home/rnczf01/Desktop/Files/pp-1.6.5:/home/rnczf01/Desktop/Files/dill-master:/home/rnczf01/Desktop/Files/multiprocess-master:/home/rnczf01/Desktop/Files/multiprocess-master/py2.7:/home/rnczf01/Desktop/Files/pox-master

Any help would be appreciated. Thanks

mmckerns commented 7 years ago

Can you put some simple sample code here that reproduces the error you are seeing? That would help diagnose what is going on.

I'm assuming you have a C compiler... the other usual suspect is a path issue.

www3cam commented 7 years ago

The code is like this. The other alternative is that I could just write map for year2_split_xml....splitxml() and pass multiple vectors into map. I think defining another function is making it unpickable:

def masterrun(file1121):
     return year2_split_xml_embedded_pftaps_allfiles.splitxml(datename, path1m, path2am, file1121, vectorword, tfmodallfile, savedict)
p = Pool(processes=12)

listofstuff = p.map(masterrun,listoffiles)

p.close()
p.join()
www3cam commented 7 years ago

I still have trouble after rewritting the code to avoid masterrun(). The entirety of the multi threading is also inside of another function. I didn't have this problem with an earlier edition of the code that didn't have it inside a function nor had p.map() return anything. What is inside year2_split_xml_embedded_pftaps_allfiles.splitxml() is pretty much the same as the earlier iteration of the code which worked (although I used base multithreading not pathos), but it does contain calls to another module parsexml multiple times. I am happy to copy whats in splitxml() although it may be better in private as its an academic endeavor and its about 150 lines long and not well commented and follow-able.

Also I should say I'm working on a Linux cluster and while I'm not an admin, I have no reason to assume I don't have a C compiler (i.e. was deleted or something)

Also import dill seems to work

One thing that occurred to me is I might have built (as in used the build command) in wrong order as I didn't use setup.py as this cluster is not connected to internet. I don't know if this would cause a problem.

mmckerns commented 7 years ago

I can't run your code, and that's not a minimal working example... so I'm hindered in helping you unless I can run the code. The typical thing to do is to post a minimal sample of code that reproduces the error.

What I can say from your snippet above is that masterrun is going to make serialization harder. First, the variables in the function are not encapsulated, so it a lot of global references. That's not a good idea -- it works best if you either pass or import all variables you need. Second, the function is defined above the pool. If you are going to do weird stuff like use global references, then it's often better to import the function.

Also, what do you mean that import dill seems to work? Work how? Does importing dill resolve your problem?

And, yes, as I was noting in my earlier response... what you are seeing in the error is often caused by not having a C-compiler. When you build, if the install of multiprocess fails in any way, it defaults back to multiprocessing... which doesn't use dill. So, yes, I'd try rebuilding the code.

Shpionus commented 7 years ago

Hi, check, please, this example:

import random
import dill
dill.detect.trace(True)
# from pathos.pools import _ProcessPool as Pool
from pathos.pools import ProcessPool as Pool

def process(item):
    return item.power()

class A(object):
    def __init__(self, num):
        super(A, self).__init__()
        self.num = num

    def power(self):
        self.num = self.num ** 2
        return self.num

pool = Pool(3)
items = [A(random.uniform(1,10)) for x in range(1,3)]

results = pool.map(process, items)
print(results)

It returns cPickle.PicklingError: Can't pickle <type 'function'>: attribute lookup __builtin__.function failed

Pathos: 0.2.1

mmckerns commented 7 years ago

@Shpionus: That helps. If you don't mind, I've edited your code slightly to include the line dill.detect.trace(True), and have it iterate less 1,3 instead of 1,10. Then I can directly discuss results.

mmckerns commented 7 years ago

So, this is interesting... it succeeds for python 2.6, and also for 3.x. When it fails, it doesn't even hit the dill trace. So, something must have changed in python 2.7 in a recent release that is messing things up for multiprocess. I'll look into it.

The traceback is:

Traceback (most recent call last):
  File "test_pathos_cpickle.py", line 24, in <module>
    results = pool.map(process, items)
  File "/Users/mmckerns/lib/python2.7/site-packages/pathos-0.2.2.dev0-py2.7.egg/pathos/multiprocessing.py", line 137, in map
    return _pool.map(star(f), zip(*args)) # chunksize
  File "/Users/mmckerns/lib/python2.7/site-packages/multiprocess-0.70.6.dev0-py2.7.egg/multiprocess/pool.py", line 251, in map
    return self.map_async(func, iterable, chunksize).get()
  File "/Users/mmckerns/lib/python2.7/site-packages/multiprocess-0.70.6.dev0-py2.7.egg/multiprocess/pool.py", line 567, in get
    raise self._value
cPickle.PicklingError: Can't pickle <type 'function'>: attribute lookup __builtin__.function failed
mmckerns commented 7 years ago

Ok... hmm... if you use from multiprocess import Pool directly for python 2.7, it also works. (see edits to the @Shpionus code above) -- this is the same as using from pathos.pools import _ProcessPool as Pool.

Looking at the travis history, everything worked less than an month ago, with zero changes... so that further supports that is a recent change to python that's impacting it.

mmckerns commented 7 years ago

@Shpionus, @www3cam: what minor version of python are you using? I'm seeing the error in 2.7.14 and even now when I rebuild in 2.7.13. The error is present in some of the existing pathos tests, and was not present upon the most recent commit.

www3cam commented 7 years ago

I'm using python 2.7.11. Error still occurs with this version

www3cam commented 7 years ago

I can also confirm that regular multiprocess works as that's what I'm using now.

mmckerns commented 7 years ago

2.7.11 and 2.7.13 definitely did not produce this error previously. Hmm, it may be a fault of a change in multiprocess as the conflict. For example, the following works in python 2.6 and 3.6, but currently does not in 2.7.

>>> import pathos.pools as pp
>>> pool = pp.ProcessPool()
>>> class Foo(object):
...   def bar(self, x):
...     return x*x
... 
>>> pool.map(Foo().bar, range(4))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/mmckerns/lib/python2.7/site-packages/pathos-0.2.2.dev0-py2.7.egg/pathos/multiprocessing.py", line 137, in map
    return _pool.map(star(f), zip(*args)) # chunksize
  File "/Users/mmckerns/lib/python2.7/site-packages/multiprocess-0.70.6.dev0-py2.7.egg/multiprocess/pool.py", line 251, in map
    return self.map_async(func, iterable, chunksize).get()
  File "/Users/mmckerns/lib/python2.7/site-packages/multiprocess-0.70.6.dev0-py2.7.egg/multiprocess/pool.py", line 567, in get
    raise self._value
cPickle.PicklingError: Can't pickle <type 'function'>: attribute lookup __builtin__.function failed

This also fails similarly in multiprocess...

>>> import multiprocess as mp
>>> _pool = mp.Pool()
>>> _pool.map(Foo().bar, range(4))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/mmckerns/lib/python2.7/site-packages/multiprocess-0.70.6.dev0-py2.7.egg/multiprocess/pool.py", line 251, in map
    return self.map_async(func, iterable, chunksize).get()
  File "/Users/mmckerns/lib/python2.7/site-packages/multiprocess-0.70.6.dev0-py2.7.egg/multiprocess/pool.py", line 567, in get
    raise self._value
cPickle.PicklingError: Can't pickle <type 'instancemethod'>: attribute lookup __builtin__.instancemethod failed

Both of the above should not be using cPickle unless _multiprocess fails to build. And, again, the above examples work for 2.6 and 3.6, and used to work in 2.7. So, I'm going to look at the impact of changes to multiprocess.

mmckerns commented 7 years ago

Well, here's the root problem:

>>> import _multiprocess
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ImportError: No module named _multiprocess
>>> 

The _multiprocess shared object builds, but for some reason doesn't install correctly.

mmckerns commented 7 years ago

Ok... so I just rebuilt multiprocess with python setup.py build and then ran python setup.py install --prefix=$HOME (where $HOME is on my python path)... and the error I was seeing goes away, and all of the above code works. There must be some incompatibility with different versions of the C _multiprocess library object, and an update doesn't rebuild it... or something like that. I'll try to diagnose whatever the cases are that allowed me to see the error, and now I don't.

What I did to get it to work is essentially to go to the source and build and install separately. Possibly it's bypassing some pip caching or something like that... but rebuild your multiprocess library and ensure that you can import _multiprocess. If that works, everything else should work.

mmckerns commented 7 years ago

Apparently there seems to be a naming convention change for eggs... so if you have an old build of multiprocess in the site-packages directory, you can get a conflict with an older installed version. Buy cleaning out the eggs and rebuilding, it seems to solve the problem.

mmckerns commented 7 years ago

I'm relabeling this... maybe it's not a bug after all.

mmckerns commented 7 years ago

@www3cam, @Shpionus: Let me know if you experience the same as I do after a clean rebuild. If so, we can close this issue.

Shpionus commented 7 years ago

@mmckerns Yes, it works. I removed both multiprocess and multiprocess-0.70.5-py2.7.egg-info from site-packages directory. Then pip install multiprocess --no-cache-dir. After that all works like a charm. Thank you.

mmckerns commented 7 years ago

Ok, so no comments in about a month... and I'm going to assume that this was fixed for everyone. Closing this now. Reopen if it's not fixed for you.

striveLogic commented 6 years ago

@mmckerns I followed the steps below.

I removed both multiprocess and multiprocess-0.70.5-py2.7.egg-info from site-packages directory. Then pip install multiprocess --no-cache-dir. After that all works like a charm.

No luck. It keeps giving following error: File "/home/artifact_migration.py", line 227, in start_process d1 = r1.get() File "/usr/lib/python2.7/site-packages/multiprocess-0.70.6.dev0-py2.7.egg/multiprocess/pool.py", line 567, in get raise self._value cPickle.PicklingError: Can't pickle <type 'instancemethod'>: attribute lookup builtin.instancemethod failed

My method: pool = Pool(processes=2 if cpu_count() < 2 else cpu_count()) BaseManager.register('AtomicOperation', AtomicOperation) manager = BaseManager() manager.start() am = manager.AtomicOperation() FOR LOOP: r1 = pool.apply_async( self.aql_query, (self.base_url_hq1, c, self.api_hq1,)) r2 = pool.apply_async( self.aql_query, (self.base_url_aws, c, self.api_aws_np,)) d1 = r1.get() d2 = r2.get()

Method aql_query() is just making HTTP Post Request and returns content

mmckerns commented 6 years ago

@striveLogic : this might not be the same issue. please open a new issue, and post some sample code that reproduces your issue.