uqfoundation / pathos

parallel graph management and execution in heterogeneous computing
http://pathos.rtfd.io
Other
1.39k stars 89 forks source link

multiprocessing.Pool cannot import from main #118

Closed justinlovinger closed 4 years ago

justinlovinger commented 7 years ago
import pathos

def foo(x):
    return x

def bar(x):
    return foo(x)

if __name__ == '__main__':
    pool = pathos.multiprocessing.Pool(processes=2)
    print pool.map(bar, [0, 1])

Running this script on Windows 10 results in:

NameError: global name 'foo' is not defined

The same occurs with pathos.multiprocessing.ProcessPool. pathos.parallel.ParallelPool does not have this issue.

justinlovinger commented 7 years ago

Further testing reveals the issue is with dill

import pickle
import dill
import multiprocessing

def foo(x):
    return x

def bar(x):
    return foo(x)

def undill_run(dill_func, arg):
    return dill.loads(dill_func)(arg)

if __name__ == '__main__':
        pool = multiprocessing.Pool(processes=2)
        print pool.map(functools.partial(undill_run, dill.dumps(bar)), [0, 1])

Returns the same error:

NameError: global name 'foo' is not defined

Error does not occur when pickle is used in place of dill (even if dill is imported).

DavidLP commented 7 years ago

This is not only the case for defined functions but also for imported modules. Due to this bug I had to switch back to standard library multiprocessing + dill.

mmckerns commented 7 years ago

@JustinLovinger @DavidLP: I'm not seeing this error on MacOS. However, do I see that at least @JustinLovinger is using Windows. On Windows, you are missing freeze_support -- which is required on window to be able to run from __main__. It's not a bug, it's a requirement inherited from multiprocessing.

Try:

if __name__ == '__main__':
    pathos.helpers.freeze_support()
    pool = pathos.multiprocessing.Pool(processes=2)
    print(pool.map(bar, [0, 1]))

Let me know if this fixes your code, (and close the ticket)... or if it doesn't please let me know what you are seeing that's different.

DavidLP commented 7 years ago

Using pathos.helpers.freeze_support() added to the axample of @JustinLovinger gives still the same exception.

I use Windows by the way. dill for pickling methods, modules + std. library multiprocessing is working for me.

mmckerns commented 6 years ago

@DavidLP: I've tried it on windows and I am not seeing any problems. So, let's figure out what in your environment is triggering this. Can you provide me your code and traceback, as well as python and system information?

Zebrafish007 commented 6 years ago

pathos.helpers.freeze_support() doesn't do the trick for me neither for the Pickle thingy (#125). Did update to python 2.7.14 via anacanoda a few days ago. No net positive result. Perhaps these issues are related? Not sure how back_end debugging works but I did that for matplotlib using cairo as suggested elsewhere and found the bug in three or four clicks.. Any suggestion if this works here?

SjurdurS commented 6 years ago

I have the same issue on Windows 10 using the same code as OP. Any suggestions on how to fix it?

#pathostest.py
import pathos

def foo(x):
    return x

def bar(x):
    return foo(x)

if __name__ == '__main__':
    pool = pathos.multiprocessing.Pool(processes=2)
    print pool.map(bar, [0, 1])

Running the above code gives me the same error

Traceback (most recent call last):
  File "pathostest.py", line 12, in <module>
    print pool.map(bar, [0, 1])
  File "C:\Python27\lib\site-packages\multiprocess\pool.py", line 253, in map
    return self.map_async(func, iterable, chunksize).get()
  File "C:\Python27\lib\site-packages\multiprocess\pool.py", line 572, in get
    raise self._value
NameError: global name 'foo' is not defined
mmckerns commented 5 years ago

@SjurdurS: It's a common error on windows, and it overwhelmingly comes from one of two things:

  1. not using pathos.helpers.freeze_support() in __main__, and/or
  2. failure to correctly build multiprocess (it needs a C compiler)

Sorry this issue seems to be stale for everyone... I'm at an impasse as I can produce the error by doing either of the two things enumerated above, but can't reproduce the error if both of the above are resolved. Not sure what to do here. If there's no other comments that can elucidate what is different about your environment, and why you are all seeing errors when using freeze_support and have a correctly built multiprocess... then I'll assume that the issue either won't fix or is a voided issue.

jeckjeck commented 5 years ago

It works if you put the functions as methods:

class parallel():

def foo(self, x):
    return x

def bar(self, x):
    return self.foo(x)

if __name__ == '__main__': p = parallel() pool = pathos.multiprocessing.Pool(processes=2) print(pool.map(p.bar, [0, 1]))

C:\Users\jeckz\PycharmProjects\pin\venv\Scripts\python.exe C:/Users/jeckz/PycharmProjects/pin/venv/pathostest.py [0, 1]

Process finished with exit code 0

However, if you try to import a package which you want to use within one of the methods for instances:

import pathos import math

class parallel(): def foo(self, x): return math.asin(x)

def bar(self, x):
   return self.foo(x)

if __name__ == '__main__': p = parallel() pool = pathos.multiprocessing.Pool(processes=2) print(pool.map(p.bar, [0, 1]))

I receive:

The above exception was the direct cause of the following exception: Traceback (most recent call last): File "C:/Users/jeckz/PycharmProjects/pin/venv/pathostest.py", line 14, in <module> print(pool.map(p.bar, [0, 1])) File "C:\Users\jeckz\PycharmProjects\pin\venv\lib\site-packages\multiprocess\pool.py", line 268, in map return self._map_async(func, iterable, mapstar, chunksize).get() File "C:\Users\jeckz\PycharmProjects\pin\venv\lib\site-packages\multiprocess\pool.py", line 657, in get raise self._value NameError: name 'math' is not defined

The solution I found is to put import math and declare it as a class variable. But I would be happy if someone found a better way.

import math class parallel(): def __init__(self): self.math = math

 `def foo(self, x):`
     `return self.math.asin(x)`
molinav commented 4 years ago

I can reproduce this problem when I install only multiprocess and not the pathos framework and I try to use the Process class by myself.

My configuration is:

Test example:

# test_example_multiprocess.py

def func1():
    print("Hello world!")

def func2():
    func1()

if __name__ == "__main__":

    from multiprocess import Process
    proc = Process(target=func2)
    proc.start()
    proc.join()

and this is the traceback:

(py37) D:\vic\Desktop>python test_example_multiprocess.py
Process Process-1:
Traceback (most recent call last):
  File "C:\GNU\Anaconda3\envs\py37\lib\site-packages\multiprocess\process.py",
  line 297, in _bootstrap
    self.run()                                                                      
  File "C:\GNU\Anaconda3\envs\py37\lib\site-packages\multiprocess\process.py",
  line 99, in run
    self._target(*self._args, **self._kwargs)
  File "test_example_multiprocess.py", line 9, in func2
    func1()
NameError: name 'func1' is not defined

(py37) D:\vic\Desktop>

The same snippet works when I use the standard multiprocessing library:

# test_example_multiprocessing.py

def func1():
    print("Hello world!")

def func2():
    func1()

if __name__ == "__main__":

    from multiprocessing import Process
    proc = Process(target=func2)
    proc.start()
    proc.join()
(py37) D:\vic\Desktop>python test_example_multiprocessing.py
Hello world!

The same problem occurs with another environment using Python 3.6.10 and multiprocess 0.70.9, and it does not occur with Python 3.5.6 and multiprocess 0.70.5. So it seems something occurred between these two versions that introduced this issue.

Edit: I was diving a bit more and in fact the multiprocess version is not the problem, it comes from dill. For Python 3.6.10 and Python 3.7.6, multiprocess 0.70.9 works with dill 0.2.8.2 and has the bug with dill 0.2.9 or newer. I have the feeling this is dill issue #323.

mmckerns commented 4 years ago

I'm going to label this a bug, even though it was a dill bug. Should be fixed due to https://github.com/uqfoundation/dill/issues/363

simonnier commented 3 years ago

@mmckerns Dear mmckerns, I don't understand why this post is closed. Because the issue is not solved! I installed the latest pathos on windows 10. If you run JustinLovinger's code in interactive jupyterlab mode, you still got "NameError: name 'foo' is not defined". But it is indeed working in scripting mode. I suggest reopen this post and solve this annoying bug completely. Thank you very much.

mmckerns commented 3 years ago

@simonnier: Did you run the code in Jupyter in a single cell or in multiple cells? Jupyter messes with the structure of the global namespace (each cell is in its own local namespace), and also messes with parallelism. This issue is closed because it was for the behavior in python in general. Working across multiple cells in Jupyter, for example, should only work under certain circumstances, but isn't guaranteed for this or several other features of pathos and dill. Feel free to open a new ticket, specifically requesting this feature in a notebook.

simonnier commented 3 years ago

@mmckerns I just opened a new issue https://github.com/uqfoundation/pathos/issues/219 , hope that multiprocess will solve all the problems in jupyterlab on windows. :)