uqfoundation / pathos

parallel graph management and execution in heterogeneous computing
http://pathos.rtfd.io
Other
1.39k stars 89 forks source link

Bug: 'name … is not defined' error with indirect/embedded calls #129

Closed dsanalytics closed 4 years ago

dsanalytics commented 6 years ago
def testf(x):
    return testf2(x)
def testf2(x):
    return(x)
import dill
import pathos
from pathos.multiprocessing import Pool
pool = Pool()
# out2 = pool.map(testf2, range(10)) # this works - prints [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
out2 = pool.map(testf, range(10)) # this gives 'NameError: name 'testf2' is not defined'
pool.close()
print(out2)

I'm getting name 'testf2' is not defined - screenshots below. I've searched all over and tried many things to no avail. As you can see, this bug makes pathos unusable for any non-trivial processing. Direct call works (commented line) but, as we know, that's not a real-life scenario. Error shows up in both Atom and VSCode and on two completely different machines. Thank you in advance for your help.

Environment: Windows 7 Home Premium 64 bit Windows 10 Home Premium 64 bit Python 3.6 64bit Atom 1.23.2 & VSCode 1.19.2 Anaconda 3 (machine 1), no Anaconda (machine 2) pathos 0.2.1

pathos-name-not-defined-error-code-listing

pathos-name-not-defined-error

dsanalytics commented 6 years ago

BTW: calling freeze_support gives the same error - v2 of the code below

def testf(x):
    return testf2(x)
def testf2(x):
    return(x)
def process_all():
    import dill
    import pathos
    from pathos.multiprocessing import Pool
    pool = Pool()
    # out2 = pool.map(testf2, range(10)) # this works - prints [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
    out2 = pool.map(testf, range(10)) # this gives 'NameError: name 'testf2' is not defined'
    pool.close()
    print(out2)
if __name__ == '__main__':
    import pathos
    pathos.helpers.freeze_support()
    process_all()
amarchin commented 5 years ago

Any update? I have the same issue.

stevennic commented 5 years ago

I am experiencing the same issue when the function I supply calls out to a different Python module (search_solr.py).

  File "D:\Programming\Python\3.7.2\lib\site-packages\pathos\multiprocessing.py", line 137, in map
    return _pool.map(star(f), zip(*args)) # chunksize
  File "D:\Programming\Python\3.7.2\lib\site-packages\multiprocess\pool.py", line 268, in map
    return self._map_async(func, iterable, mapstar, chunksize).get()
  File "D:\Programming\Python\3.7.2\lib\site-packages\multiprocess\pool.py", line 657, in get
    raise self._value
NameError: ("name 'search_solr' is not defined", 'occurred at index 0')
jeckjeck commented 5 years ago

It works if you put the functions as methods:

class parallel():

def foo(self, x):
    return x

def bar(self, x):
    return self.foo(x)

if __name__ == '__main__': p = parallel() pool = pathos.multiprocessing.Pool(processes=2) print(pool.map(p.bar, [0, 1]))

C:\Users\jeckz\PycharmProjects\pin\venv\Scripts\python.exe C:/Users/jeckz/PycharmProjects/pin/venv/pathostest.py [0, 1]

Process finished with exit code 0

However, if you try to import a package which you want to use within one of the methods for instances:

import pathos import math

class parallel(): def foo(self, x): return math.asin(x)

def bar(self, x):
   return self.foo(x)

if __name__ == '__main__': p = parallel() pool = pathos.multiprocessing.Pool(processes=2) print(pool.map(p.bar, [0, 1]))

I receive:

The above exception was the direct cause of the following exception: Traceback (most recent call last): File "C:/Users/jeckz/PycharmProjects/pin/venv/pathostest.py", line 14, in <module> print(pool.map(p.bar, [0, 1])) File "C:\Users\jeckz\PycharmProjects\pin\venv\lib\site-packages\multiprocess\pool.py", line 268, in map return self._map_async(func, iterable, mapstar, chunksize).get() File "C:\Users\jeckz\PycharmProjects\pin\venv\lib\site-packages\multiprocess\pool.py", line 657, in get raise self._value NameError: name 'math' is not defined

The solution I found is to put import math and declare it as a class variable. But I would be happy if someone found a better way.

import math class parallel(): def __init__(self): self.math = math

 `def foo(self, x):`
     `return self.math.asin(x)`
mmckerns commented 5 years ago

I'm not sure, but this may be a windows issue (i.e. #65). I tried all the above code on a MacOS, and it works. I do expect that since the errors are similar to those seen in https://github.com/uqfoundation/multiprocess/issues/65 -- and windows has a different forking behavior than does Mac or Linux, it's something related to that. I'll see about doing some follow-up testing on windows. What about if you use dill.settings['recurse'] = True? This has been seen to workaround related issues on MacOS.

michaelnowotny commented 4 years ago

I had the same problem on MacOS and dill.settings['recurse'] = True solved it. Thank you!

danvip10 commented 4 years ago

Is there any update on this? I have the same problem on Windows and dill_settings['recurse']=True did not fix it.

mmckerns commented 4 years ago

I believe this has been solved by https://github.com/uqfoundation/dill/pull/363. Please reopen this issue if that is not the case.

amithadiraju1694 commented 4 years ago

@mmckerns I have the same issue on Window, I feel the issue is not solved yet. Do you want me to re-open ? or create a new issue ?

mmckerns commented 4 years ago

@amit8121: if you feel the issue isn't solved yet, then please reopen and post the details that you are seeing. If it's determined to be a new issue, then we will move to a new ticket. Please note your versions of dill, multiprocess, pathos, and Python, as well a snippet of code that produces the issue, and the traceback you see. Do note that the patch mentioned above was to dill, and that patch is not in any of the released versions yet (coming very soon).

charey6 commented 2 years ago

Just so that someone having similar issue on Windows knows, it can also be solved by dill.settings['recurse'] = True . Specifically, on Windows, install multiprocess instead of using multiprocessing. Update latest dill in conda

import dill dill.settings['recurse'] = True

Then just as normal: from multiprocess import Pool

dsanalytics commented 2 years ago

@charey6 Thanks. Some questions: 1) Which version of python, dill, and multiprocess are you running? 2) What's wrong/diff with multiprocessing as opposed to multiprocess? 3) github link to multiprocess lib repo?

Also, can someone test and confirm this? Thank you.

mmckerns commented 2 years ago

@dsanalytics: with regard to (2), multiprocessing uses pickle while multiprocess uses dill and some very slightly modified Pickler classes. This enables better object serialization, which includes the storing of dependencies (as in your case above). With regard to (3), https://github.com/uqfoundation/multiprocess, but you can also install it with pip.