pgiri / dispy

Distributed and Parallel Computing Framework with / for Python
https://dispy.org
Other
266 stars 55 forks source link

Node crashes with "Runtime error - StopIteration(None, 'ACK')" on Python 3.7.1 #173

Open max810 opened 5 years ago

max810 commented 5 years ago

I launched the node - all OK. Next I launched the host - it started working and immediately the node crashed with an unhandeled Error. Here's the log: NODE:

D:\Work\Projects\DisPy\host_distr_vjengine\node>py dispynode.py -d --dest_path_prefix="prefix" --cpus=1 --clean

    Reading standard input disabled, as multiprocessing
    does not seem to work with reading input under Windows
2019-02-19 18:07:20 dispynode - dispynode version: 4.10.5, PID: 13724
2019-02-19 18:07:20 dispy - IPv6 may not work without "netifaces" package!
2019-02-19 18:07:20 dispynode - Files will be saved under "D:\Work\Projects\DisPy\host_distr_vjengine\node\prefix\dispy\node"

    Apparently previous dispynode (PID 4808) has gone away;
    please check manually and kill process(es) if necessary

2019-02-19 18:07:20 pycos - version 4.8.10 with IOCP I/O notifier
2019-02-19 18:07:21 dispynode - "DESKTOP-3CBMNC6" serving 1 cpus
2019-02-19 18:07:21 dispynode - TCP server at 192.168.1.19:51348
2019-02-19 18:08:06 dispynode - New computation "4f05a01a43bd0b0af0ecb3057e277880219ed1c8" from 192.168.1.19
2019-02-19 18:08:06 pycos - uncaught exception in !tcp_req/2050153298584:
Traceback (most recent call last):
  File "dispynode.py", line 1399, in setup_computation
    raise StopIteration(None, 'ACK')
StopIteration: (None, 'ACK')

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:\Users\Maksym\Anaconda3\lib\site-packages\pycos\__init__.py", line 3665, in _schedule
    retval = task._generator.throw(*exc)
  File "dispynode.py", line 1612, in tcp_req
    client, resp = yield setup_computation(msg, task=task)
  File "C:\Users\Maksym\Anaconda3\lib\site-packages\pycos\__init__.py", line 3667, in _schedule
    retval = task._generator.send(task._value)
RuntimeError: generator raised StopIteration

2019-02-19 18:08:07 dispynode - Busy (1/1); ignoring ping message from 192.168.1.19
2019-02-19 18:08:07 pycos - uncaught exception in !send_pong_msg/2050154615560:
Traceback (most recent call last):
  File "dispynode.py", line 866, in send_pong_msg
    raise StopIteration
StopIteration

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:\Users\Maksym\Anaconda3\lib\site-packages\pycos\__init__.py", line 3667, in _schedule
    retval = task._generator.send(task._value)
RuntimeError: generator raised StopIteration

HOST:

D:\Work\Projects\DisPy\host_distr_vjengine\host>py vjdist_server.py
2019-02-19 18:08:06 pycos - version 4.8.10 with IOCP I/O notifier
2019-02-19 18:08:06 dispy - dispy client version: 4.10.5
2019-02-19 18:08:06 dispy - IPv6 may not work without "netifaces" package!
2019-02-19 18:08:06 dispy - Storing fault recovery information in "_dispy_20190219180806"

What I've tried: Using Python 3.7.1 with dispy 4.10.5 Using Python 3.6.8 with dispy 4.10.5 (conda env)

Everythig worked just fine on another computer with approx. the same configuration (windows 10, python 3.6.3 (just system python, not a venv)). So I basically moved files to another computer and everything stopped working.

max810 commented 5 years ago

UPDATE - I've just tried it with Python 3.6.3 - everything worked just fine. So the question is - what is so different between Python 3.6.8 and 3.6.3? Why does the job on the node stop with StopIteration on Python 3.6.8 but evetyrhin works perfectly with 3.6.3?

pgiri commented 5 years ago

Python 3.7 changed semantics of raise StopIteration in generators (see PEP 479). dispy and pycos installed with pip will automatically translate such statements so it works with Python 3.7+. I don't know if Python 3.6.8 has changed this as well (I doubt). I have tested with 3.6.7 but not with 3.6.8. Can you post a sample test program that exhibits this issue with 3.6.8?

max810 commented 5 years ago

I wasn't able to replicate a sample program in Python 3.6.8 (and I can't share mine, because it's really complicated) because I ran into a strange issue (I will describe it below).

But I was able to replicate it with Python 3.7.1 with just a "canonical example" from dispy manual page.

image

Here's the full dispynode log:

    Reading standard input disabled, as multiprocessing
    does not seem to work with reading input under Windows
2019-02-21 10:55:01 dispynode - dispynode version: 4.10.5, PID: 1888
2019-02-21 10:55:01 dispy - IPv6 may not work without "netifaces" package!
2019-02-21 10:55:01 dispynode - Files will be saved under "C:\Users\Maksym\PycharmProjects\dispy\prefix"

    Apparently previous dispynode (PID None) has gone away;
    please check manually and kill process(es) if necessary

2019-02-21 10:55:01 pycos - version 4.8.10 with IOCP I/O notifier
2019-02-21 10:55:02 dispynode - "DESKTOP-3CBMNC6" serving 1 cpus
2019-02-21 10:55:02 dispynode - TCP server at 192.168.1.19:51348
2019-02-21 10:55:13 dispynode - New computation "a8ba112dc9b5a9b4c5704f38139f2de3096a2d8a" from 192.168.1.19
2019-02-21 10:55:13 pycos - uncaught exception in !tcp_req/2020695914808:
Traceback (most recent call last):
  File "dispy/dispynode.py", line 1382, in setup_computation
    localvars['_dispy_setup_status'])
StopIteration: (None, 'ACK')

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:\Users\Maksym\Anaconda3\lib\site-packages\pycos\__init__.py", line 3665, in _s
  File "C:\Users\Maksym\Anaconda3\lib\site-packages\pycos\__init__.py", line 3665, in _schedule
  File "C:\Users\Maksym\Anaconda3\lib\site-packages\pycos\__init__.py", lin  File "dispy/dispynode.py", line 1
595, in The above exception was the direct cause of the following e  File "C:\Users\Maksym\Anaconda3\lib\site-
packages\pycos\_xceptionThe above exception was the direct cause of the following exc
eption:
The above exception was the direct cause of the following exceptioion:
n:
The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:\Users\Maksym\Anaconda3\lib\site-packages\pycos\__init__.py"  File "C:\Users\Maksym\Anaconda3\lib\si
te-packages\pycos\__init__.p
  File "C:\Users\Maksym\Anaconda3\lib\site-packages\pycos\__init__.py", line 3665, inn _schedule
  File "C:\Users\Maksym\Anaconda3\lib\site-packages\pycos\__init__.py", line 3665, in _sschedule
  File "C:\Users\Maksym\Anaconda3\lib\site-packages\pycos\__init__.py", line 3665, in _schedu    retval = task
._generator.throw(*exc)
  File "C:\Users\Maksym\Anaconda3\lib\site-packages\pycos\__init__.py", line 3665, in _schedule
    retval = task._generator.throw(*exc)
  File "dispy/dispynode.py", line 1595, in tcp_req
    conn.close()
  File "C:\Users\Maksym\Anaconda3\lib\site-packages\pycos\__init__.py", line 3667, in _schedule
    retval = task._generator.send(task._value)
RuntimeError: generator raised StopIteration

Here are the installed modules:

Python 3.7.1 (default, Dec 10 2018, 22:54:23) [MSC v.1915 64 bit (AMD64)] :: Anaconda, Inc. on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import dispy
>>> dispy.__version__
'4.10.5'
>>> import pycos
>>> pycos.__version__
'4.8.10'
>>> import psutil
>>> psutil.__version__
'5.4.8'
>>>

max810 commented 5 years ago

About the weird problem with Python 3.6.8: So I've tried creating a fresh Python 3.6.8 virtual environment - and when I launched the host I encountered this error:

(test368) C:\Users\Maksym\PycharmProjects\dispy>python host_test.py
2019-02-21 11:25:16 pycos - version 4.8.10 with IOCP I/O notifier
2019-02-21 11:25:16 dispy - dispy client version: 4.10.5
2019-02-21 11:25:16 dispy - IPv6 may not work without "netifaces" package!
2019-02-21 11:25:16 dispy - Storing fault recovery information in "_dispy_20190221112516"
2019-02-21 11:25:16 dispy - Transfer of computation "compute" to 192.168.1.19 failed
2019-02-21 11:25:16 dispy - Failed to setup 192.168.1.19 for compute "compute": -1

The node didn't even react to this - looks like the host just didn't communicate with it.

(test368) C:\Users\Maksym\PycharmProjects\dispy\dispy>python dispynode.py

    Reading standard input disabled, as multiprocessing
    does not seem to work with reading input under Windows
2019-02-21 11:24:35 dispynode - dispynode version: 4.10.5, PID: 17864
2019-02-21 11:24:35 dispy - IPv6 may not work without "netifaces" package!
2019-02-21 11:24:35 dispynode - Files will be saved under "C:\Users\Maksym\AppData\Local\Temp"
2019-02-21 11:24:35 pycos - version 4.8.10 with IOCP I/O notifier
2019-02-21 11:24:36 dispynode - "DESKTOP-3CBMNC6" serving 8 cpus
max810 commented 5 years ago

I've tried debugging - and apparantely this is the code part that leads to error: (It's in dispy module):

    def setup(self, compute, exclusive=True, task=None):
        # generator
        compute.scheduler_ip_addr = self.scheduler_ip_addr
        compute.node_ip_addr = self.ip_addr
        compute.exclusive = exclusive
        reply = yield self.send(b'COMPUTE:' + serialize(compute), task=task)
        try:
            cpus = deserialize(reply)
            assert isinstance(cpus, int) and cpus > 0
        except Exception:
            logger.warning('Transfer of computation "%s" to %s failed', compute.name, self.ip_addr)
            raise StopIteration(-1)

The reply is just b'', it's empty.

max810 commented 5 years ago

The code I was using is exactly this: http://dispy.sourceforge.net/examples.html#canonical-program

max810 commented 5 years ago

This looks like not dispy's, but rather my issue - the same happens with Python 3.6.3

image

I'll fix this and try again. It probably has something to do with virtual environments.

max810 commented 5 years ago

UPDATE: A simple reboot solved all the problems with Python 3.6.8. BUT the problem with Python 3.7.1 is still here. And everything I stated above, corresponding to Python 3.7.1 still holds true. Again, the latest version of dispy is installed. Is it supposed to work with 3.7? Am I missing a patch/update?

So, I think, the initial problem with 3.6.8 was caused by something else and I should rename the title.

max810 commented 5 years ago

I specifically created a clean 3.7.1 environment, installed dispy, 'psutil' and pywin32 via pip - the issue still happens. With again the canonical program http://dispy.sourceforge.net/examples.html#canonical-program.

Here are all the logs again. Node:

C:\Users\Maksym>conda create -n test371 python=3.7.1
Solving environment: done

==> WARNING: A newer version of conda exists. <==
  current version: 4.5.12
  latest version: 4.6.4

Please update conda by running

    $ conda update -n base -c defaults conda

## Package Plan ##

  environment location: C:\Users\Maksym\Anaconda3\envs\test371

  added / updated specs:
    - python=3.7.1

The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    pip-19.0.1                 |           py37_0         1.8 MB
    setuptools-40.8.0          |           py37_0         663 KB
    ca-certificates-2019.1.23  |                0         158 KB
    ------------------------------------------------------------
                                           Total:         2.6 MB

The following NEW packages will be INSTALLED:

    ca-certificates: 2019.1.23-0
    certifi:         2018.11.29-py37_0
    openssl:         1.1.1a-he774522_0
    pip:             19.0.1-py37_0
    python:          3.7.1-h8c8aaf0_6
    setuptools:      40.8.0-py37_0
    sqlite:          3.26.0-he774522_0
    vc:              14.1-h0510ff6_4
    vs2015_runtime:  14.15.26706-h3a45250_0
    wheel:           0.32.3-py37_0
    wincertstore:    0.2-py37_0

Proceed ([y]/n)? y

Downloading and Extracting Packages
pip-19.0.1           | 1.8 MB    | ############################################################################ | 100%
setuptools-40.8.0    | 663 KB    | ############################################################################ | 100%
ca-certificates-2019 | 158 KB    | ############################################################################ | 100%
Preparing transaction: done
Verifying transaction: done
Executing transaction: done
#
# To activate this environment, use:
# > activate test371
#
# To deactivate an active environment, use:
# > deactivate
#
# * for power-users using bash, you must source
#

C:\Users\Maksym>activate test371

(test371) C:\Users\Maksym>pip install dispy psutil pywin32
Collecting dispy
Collecting psutil
  Downloading https://files.pythonhosted.org/packages/3d/22/ed4fa46c5bfd95b4dc57d6544c3fe6568abe398aef3990f6011777f1a3f3/psutil-5.5.1-cp37-cp37m-win_amd64.whl (228kB)
    100% |████████████████████████████████| 235kB 1.6MB/s
Collecting pywin32
  Downloading https://files.pythonhosted.org/packages/a3/8a/eada1e7990202cd27e58eca2a278c344fef190759bbdc8f8f0eb6abeca9c/pywin32-224-cp37-cp37m-win_amd64.whl (9.0MB)
    100% |████████████████████████████████| 9.1MB 3.3MB/s
Collecting pycos>=4.8.10 (from dispy)
Installing collected packages: pycos, dispy, psutil, pywin32
Successfully installed dispy-4.10.5 psutil-5.5.1 pycos-4.8.10 pywin32-224

(test371) C:\Users\Maksym>cd PycharmProjects\dispy\dispy

(test371) C:\Users\Maksym\PycharmProjects\dispy\dispy>python dispynode.py --clean -d --dest_path_prefix="prefix" --cpus=1

    Reading standard input disabled, as multiprocessing
    does not seem to work with reading input under Windows
2019-02-21 13:01:49 dispynode - dispynode version: 4.10.5, PID: 9036
2019-02-21 13:01:49 dispy - IPv6 may not work without "netifaces" package!
2019-02-21 13:01:49 dispynode - Files will be saved under "C:\Users\Maksym\PycharmProjects\dispy\dispy\prefix"

    Apparently previous dispynode (PID 11916) has gone away;
    please check manually and kill process(es) if necessary

2019-02-21 13:01:49 dispynode - Killing process with ID 11916
2019-02-21 13:01:49 pycos - version 4.8.10 with IOCP I/O notifier
2019-02-21 13:01:49 dispynode - "DESKTOP-3CBMNC6" serving 1 cpus
2019-02-21 13:01:49 dispynode - TCP server at 192.168.1.19:51348
2019-02-21 13:02:00 dispynode - New computation "fb33ef688c7418285eccc873b397ef2f599cb4b9" from 192.168.1.19
2019-02-21 13:02:00 pycos - uncaught exception in !tcp_req/1998651787400:
Traceback (most recent call last):
  File "dispynode.py", line 1382, in setup_computation
    raise StopIteration(None, 'ACK')
StopIteration: (None, 'ACK')

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:\Users\Maksym\Anaconda3\envs\test371\lib\site-packages\pycos\__init__.py", line 3665, in _schedule
    retval = task._generator.throw(*exc)
  File "dispynode.py", line 1595, in tcp_req
    client, resp = yield setup_computation(msg, task=task)
  File "C:\Users\Maksym\Anaconda3\envs\test371\lib\site-packages\pycos\__init__.py", line 3667, in _schedule
    retval = task._generator.send(task._value)
RuntimeError: generator raised StopIteration

And host:

C:\Users\Maksym>activate test371

(test371) C:\Users\Maksym>cd PycharmProjects\dispy

(test371) C:\Users\Maksym\PycharmProjects\dispy>python host_test.py
2019-02-21 13:02:00 pycos - version 4.8.10 with IOCP I/O notifier
2019-02-21 13:02:00 dispy - dispy client version: 4.10.5
2019-02-21 13:02:00 dispy - IPv6 may not work without "netifaces" package!
2019-02-21 13:02:00 dispy - Storing fault recovery information in "_dispy_20190221130200"
pgiri commented 5 years ago

It looks like dispy and pycos were not installed with pip of Python 3.7. As mentioned above, when installed with python -m pip (if there are multiple versions of python, then give appropriate path to python that you use to start dispynode, for example), the installation translates dispy and pycos to work with Python 3.7 as per PEP 479. Same dispy/pycos installation can't be used with both Python 3.6 and Python 3.7; if you need to use both versions, install them in different paths and use appropriate one with each version of Python.

max810 commented 5 years ago

I have just tried reinstalled dispy via python -m pip install dispy --upgrade --force-reinstall, I double checked that I launch exactly Python 3.7. Still got the same issue. Is there a manual fix to this? Like, a script of some kind?

max810 commented 5 years ago

Maybe the problem is with dispynode.py file? I downloaded it from github.

max810 commented 5 years ago

I fixed it! I went into ~\Anaconda3\envs\\<my Python 3.7 environment>\Lib\site-packages\dispy and copied dispynode.py file from there to where I needed it.

Is this the way one is supposed to use dispynode? If so, I would like to help by adding and entry about launching nodes to README.md via a pull-request.

mzy2240 commented 4 years ago

I'm having the save problem with 3.7 even after I copied the dispynode.py file to the working path. Anyone has a idea how to fix it?