uqfoundation / pathos

parallel graph management and execution in heterogeneous computing
http://pathos.rtfd.io
Other
1.38k stars 89 forks source link

Exception handling with ParallelPool #228

Open zhuchcn opened 2 years ago

zhuchcn commented 2 years ago

Thanks for making this great library! I have a question regarding to how exception handling could be done with ParallelPool. Using ProcessPool, errors can be caught as normal. However with ParallelPool, the error couldn't be caught like ProcessPool see below. It seems like the error of all three worker processes are printed out to stderr directly, but none of then are caught.

from pathos.pools import ParallelPool as Pool

def f(i):
    raise ValueError('Error raised!!')

def main():
    pool = Pool(threads=4)
    try:
        pool.map(f, [1,2,3])
    except Exception as e:
        print('Error caught')
        raise e

if __name__ == '__main__':
    main()
An error has occured during the function execution
Traceback (most recent call last):
  File "/home/chenghaozhu/miniconda3/lib/python3.8/site-packages/ppft/__main__.py", line 111, in run
    __result = __f(*__args)
  File "<string>", line 2, in f
ValueError: Error raised!!
 An error has occured during the function execution
Traceback (most recent call last):
  File "/home/chenghaozhu/miniconda3/lib/python3.8/site-packages/ppft/__main__.py", line 111, in run
    __result = __f(*__args)
  File "<string>", line 2, in f
ValueError: Error raised!!
 An error has occured during the function execution
Traceback (most recent call last):
  File "/home/chenghaozhu/miniconda3/lib/python3.8/site-packages/ppft/__main__.py", line 111, in run
    __result = __f(*__args)
  File "<string>", line 2, in f
ValueError: Error raised!!
 %           

We also noticed that some binary strings are printed out with the error messages. There is a "%" symbol in the end of the example below, but in real life, we got something like this:

^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@

I'm wondering if the ParallelPool has its own way of error handling and please point me to it if it is already documented!

mmckerns commented 1 year ago

Sorry for the slow reply. I'm seeing the same issue as you...

Results from the interpreter:

Python 3.7.15 (default, Oct 12 2022, 04:11:53) 
[Clang 10.0.1 (clang-1001.0.46.4)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import pathos
>>> p = pathos.pools.ParallelPool()
>>> def fgk(x):
...   z = x**2
...   y = '1' + z
...   return y
... 
>>> p.map(fgk, [1,2])
An error has occured during the function execution
Traceback (most recent call last):
  File "/Users/mmckerns/lib/python3.7/site-packages/ppft/__main__.py", line 99, in run
    __result = __f(*__args)
  File "<string>", line 3, in fgk
TypeError: can only concatenate str (not "int") to str
 An error has occured during the function execution
Traceback (most recent call last):
  File "/Users/mmckerns/lib/python3.7/site-packages/ppft/__main__.py", line 99, in run
    __result = __f(*__args)
  File "<string>", line 3, in fgk
TypeError: can only concatenate str (not "int") to str
 [None, None]
>>>
>>> pathos.__version__
'0.3.0.dev0'
>>> import ppft
>>> ppft.__version__
'1.7.6.6.dev0'
>>> import dill
>>> dill.__version__
'0.3.6.dev0'
>>>
>>> try:
...   p.map(fgk, [1,2])
... except Exception:
...   print('Error caught!')
... 
An error has occured during the function execution
Traceback (most recent call last):
  File "/Users/mmckerns/lib/python3.7/site-packages/ppft/__main__.py", line 99, in run
    __result = __f(*__args)
  File "<string>", line 3, in fgk
TypeError: can only concatenate str (not "int") to str
 An error has occured during the function execution
Traceback (most recent call last):
  File "/Users/mmckerns/lib/python3.7/site-packages/ppft/__main__.py", line 99, in run
    __result = __f(*__args)
  File "<string>", line 3, in fgk
TypeError: can only concatenate str (not "int") to str
 [None, None]
>>>
>>> def fgt(x):
...   z = x**2
...   try:
...     y = '1' + z
...   except Exception:
...     y = z
...   return y
... 
>>> p.map(fgt, [1,2])
[1, 4]

Results are no different when running from a file. And when running your code:

An error has occured during the function execution
Traceback (most recent call last):
  File "/Users/mmckerns/lib/python3.7/site-packages/ppft/__main__.py", line 99, in run
    __result = __f(*__args)
  File "<string>", line 2, in f
ValueError: Error raised!!
 An error has occured during the function execution
Traceback (most recent call last):
  File "/Users/mmckerns/lib/python3.7/site-packages/ppft/__main__.py", line 99, in run
    __result = __f(*__args)
  File "<string>", line 2, in f
ValueError: Error raised!!
 An error has occured during the function execution
Traceback (most recent call last):
  File "/Users/mmckerns/lib/python3.7/site-packages/ppft/__main__.py", line 99, in run
    __result = __f(*__args)
  File "<string>", line 2, in f
ValueError: Error raised!!

I also see that the errors are raised, but are not caught. Note that I'm not seeing the same issue you are with the string being printed oddly. Let me know what OS and versions of the code you are using.

Hmm. If I remember correctly ParallelPool doesn't have the same ability to capture a traceback from the worker process and return it to the calling process. I'll take this as a feature request.

zhuchcn commented 1 year ago

I tried several things.

  1. Launch a docker container of python 3.7.11, install pathos (tried both the main branch or the ones from PyPI), there is no trailing "%" symbol.
  2. Run the script in the docker container from the host directly, the trailing "%” is printed. The command I used is docker run -it --rm -v $(pwd):$(pwd) -w $(pwd) python:3.7.15 /bin/bash -c "pip install pathos==0.2.8; python test.py".
  3. Create a conda env with python 3.7 and pathos installed with git or pip, the trailing "%" is also printed. Tried this on a centos 7 cluster as well as my local windows machine running ubuntu WSL2 (running inside wsl not windows).

Not very sure what's causing it but this might be relevant to the OS.

mmckerns commented 1 year ago

is there a reason you are using pathos 0.2.8 instead of 0.3.0? Can you try against a fresh release of pathos (and it's dependencies) instead of 0.2.8? Are you using that version everywhere (i.e. it'd be helpful to know the versions of pathos, ppft, and dill you are using).

zhuchcn commented 1 year ago

0.2.8 is the version that I used to use when I saw the issue with the binary symbols. Here is what I just tried:

docker run -it --rm -v $(pwd):$(pwd) -w $(pwd) python:3.7.15 /bin/bash -c "pip install pathos==0.3.0; pip freeze; python test.py" > test.out

Below is what got written into test.out, minus the output for pip.

Installing collected packages: ppft, pox, dill, multiprocess, pathos
Successfully installed dill-0.3.6 multiprocess-0.70.14 pathos-0.3.0 pox-0.3.2 ppft-1.7.6.6
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
WARNING: You are using pip version 22.0.4; however, version 22.3 is available.
You should consider upgrading via the '/usr/local/bin/python -m pip install --upgrade pip' command.
dill==0.3.6
multiprocess==0.70.14
pathos==0.3.0
pox==0.3.2
ppft==1.7.6.6
An error has occured during the function execution
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/ppft/__main__.py", line 99, in run
    __result = __f(*__args)
  File "<string>", line 2, in f
ValueError: Error raised!!
 An error has occured during the function execution
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/ppft/__main__.py", line 99, in run
    __result = __f(*__args)
  File "<string>", line 2, in f
ValueError: Error raised!!
 ^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@
^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@An error has occured during the function execution
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/ppft/__main__.py", line 99, in run
    __result = __f(*__args)
  File "<string>", line 2, in f
ValueError: Error raised!!
mmckerns commented 1 year ago

Hmm. That's helpful... and so weird. Thanks I'll see if I can reproduce it.