soravux / scoop

SCOOP (Scalable COncurrent Operations in Python)
https://github.com/soravux/scoop
GNU Lesser General Public License v3.0
636 stars 87 forks source link

socket.gaierror: [Errno -2] Name or service not known #49

Closed fx-kirin closed 7 years ago

fx-kirin commented 7 years ago

I couldn't use ssh host name.

I checked the parameter to pass the method getaddrinfo.

scoop.BROKER.externalHostname returned thx and scoop.BROKER.task_port returned random int.

How to use other computer through ssh with ~/.ssh/config file? It looks not supported as long as I check the process.

Environment

thx@thx-Prime:~/workspace/scoop$ python -m scoop --host thx -vv scoop_test.py
[2016-12-21 12:11:44,449] launcher  INFO    SCOOP 0.7 2.0 on linux2 using Python 2.7.12 (default, Oct 21 2016, 22:26:43) [GCC 5.4.0 20160609], API: 1013
[2016-12-21 12:11:44,449] launcher  INFO    Deploying 1 worker(s) over 1 host(s).
[2016-12-21 12:11:44,449] launcher  DEBUG   Using hostname/ip: "thx" as external broker reference.
[2016-12-21 12:11:44,449] launcher  DEBUG   The python executable to execute the program with is: /home/thx/.pyenv/versions/2.7.12/bin/python.
[2016-12-21 12:11:44,449] launcher  INFO    Worker d--istribution:
[2016-12-21 12:11:44,449] launcher  INFO       thx:     0 + origin
[2016-12-21 12:11:44,449] brokerLaunch DEBUG   Launching remote broker: ssh -tt -x -oStrictHostKeyChecking=no -oBatchMode=yes -oUserKnownHostsFile=/dev/null -oServerAliveInterval=300 thx /home/thx/.pyenv/versions/2.7.12/bin/python -m scoop.broker.__main__ --echoGroup --echoPorts --backend ZMQ
[2016-12-21 12:11:44,750] brokerLaunch DEBUG   Foreign broker launched on ports 46544, 45126 of host thx.
                                                                                                         [2016-12-21 12:11:44,750] launcher  DEBUG   Initialising remote origin worker 1 [thx].
                                                                                                                                                                                               [2016-12-21 12:11:44,751] launcher  DEBUG   thx: Launching '/home/thx/.pyenv/versions/2.7.12/bin/python -m scoop.launch.__main__ 1 3 --size 1 --workingDirectory "/home/thx/workspace/scoop" --brokerHostname 127.0.0.1 --externalBrokerHostname thx --taskPort 46544 --metaPort 45126 --origin --backend=ZMQ -vvv scoop_test.py'
                                                                                      Warning: Permanently added '192.168.21.10' (ECDSA) to the list of known hosts.
Launching 1 worker(s) using /bin/bash.
Executing '['/home/thx/.pyenv/versions/2.7.12/bin/python', '-m', 'scoop.bootstrap.__main__', '--size', '1', '--workingDirectory', '/home/thx/workspace/scoop', '--brokerHostname', '127.0.0.1', '--externalBrokerHostname', 'thx', '--taskPort', '46544', '--metaPort', '45126', '--origin', '--backend=ZMQ', '-vvv', 'scoop_test.py']'...
Traceback (most recent call last):
  File "/home/thx/.pyenv/versions/2.7.12/lib/python2.7/runpy.py", line 174, in _run_module_as_main
    "__main__", fname, loader, pkg_name)
  File "/home/thx/.pyenv/versions/2.7.12/lib/python2.7/runpy.py", line 72, in _run_code
    exec code in run_globals
  File "/home/thx/.pyenv/versions/2.7.12/lib/python2.7/site-packages/scoop/bootstrap/__main__.py", line 298, in <module>
    b.main()
  File "/home/thx/.pyenv/versions/2.7.12/lib/python2.7/site-packages/scoop/bootstrap/__main__.py", line 92, in main
    self.run()
  File "/home/thx/.pyenv/versions/2.7.12/lib/python2.7/site-packages/scoop/bootstrap/__main__.py", line 285, in run
    futures_startup()
  File "/home/thx/.pyenv/versions/2.7.12/lib/python2.7/site-packages/scoop/bootstrap/__main__.py", line 266, in futures_startup
    run_name="__main__"
  File "/home/thx/.pyenv/versions/2.7.12/lib/python2.7/site-packages/scoop/futures.py", line 65, in _startup
    result = _controller.switch(rootFuture, *args, **kargs)
  File "/home/thx/.pyenv/versions/2.7.12/lib/python2.7/site-packages/scoop/_control.py", line 199, in runController
    execQueue = FutureQueue()
  File "/home/thx/.pyenv/versions/2.7.12/lib/python2.7/site-packages/scoop/_types.py", line 264, in __init__
    self.socket = Communicator()
  File "/home/thx/.pyenv/versions/2.7.12/lib/python2.7/site-packages/scoop/_comm/scoopzmq.py", line 70, in __init__
    info = socket.getaddrinfo(scoop.BROKER.externalHostname, scoop.BROKER.task_port)[0]
socket.gaierror: [Errno -2] Name or service not known
Exception AttributeError: "'FutureQueue' object has no attribute 'socket'" in <bound method FutureQueue.__del__ of <scoop._types.FutureQueue object at 0x7f5abe703a90>> ignored
Connection to 192.168.21.10 closed.
[2016-12-21 12:11:45,344] launcher  INFO    Root process is done.
                                                                 [2016-12-21 12:11:45,345] workerLaunch DEBUG   Closing workers on thx (1 workers).
                                                                                                                                                   [2016-12-21 12:11:45,345] brokerLaunch DEBUG   Closing broker on host thx.
        Warning: Permanently added '192.168.21.10' (ECDSA) to the list of known hosts.
russelljjarvis commented 7 years ago

I am also getting this error. Environment docker debian python3

git clone https://github.com/soravux/scoop
cd scoop
sudo apt-get install python3-setuptools
sudo /opt/conda/bin/easy_install gevent
sudo /opt/conda/bin/easy_install greenlet
sudo sudo /opt/conda/bin/python3 setup.py install
jovyan@650b8092666a:~/git/scoop/test$ python -m scoop tests.py 
[2016-12-22 18:10:55,135] launcher  INFO    SCOOP 0.7 2.0 on linux using Python 3.5.2 |Continuum Analytics, Inc.| (default, Jul  2 2016, 17:53:06) [GCC 4.4.7 20120313 (Red Hat 4.4.7-1)], API: 1013
[2016-12-22 18:10:55,135] launcher  INFO    Deploying 8 worker(s) over 1 host(s).
[2016-12-22 18:10:55,135] launcher  INFO    Worker d--istribution: 
[2016-12-22 18:10:55,135] launcher  INFO       127.0.0.1:   7 + origin
Launching 8 worker(s) using /bin/bash.
............[Errno 2] No such file or directory: 'tests.py'
File: tests.py
In path: /home/jovyan
......Exception in thread Thread-1:
Traceback (most recent call last):
  File "/opt/conda/lib/python3.5/threading.py", line 914, in _bootstrap_inner
    self.run()
  File "/opt/conda/lib/python3.5/threading.py", line 862, in run
    self._target(*self._args, **self._kwargs)
  File "/opt/conda/lib/python3.5/site-packages/scoop/_comm/scoopzmq.py", line 196, in _reportFutures
    pickle.dumps(fids),
  File "/opt/conda/lib/python3.5/site-packages/zmq/sugar/socket.py", line 366, in send_multipart
    self.send(msg, SNDMORE|flags, copy=copy, track=track)
  File "zmq/backend/cython/socket.pyx", line 636, in zmq.backend.cython.socket.Socket.send (zmq/backend/cython/socket.c:7305)
  File "zmq/backend/cython/socket.pyx", line 673, in zmq.backend.cython.socket.Socket.send (zmq/backend/cython/socket.c:6956)
  File "zmq/backend/cython/socket.pyx", line 94, in zmq.backend.cython.socket._check_closed (zmq/backend/cython/socket.c:1742)
zmq.error.ZMQError: Socket operation on non-socket

.Exception in thread Thread-3:
Traceback (most recent call last):
  File "/opt/conda/lib/python3.5/threading.py", line 914, in _bootstrap_inner
    self.run()
  File "/opt/conda/lib/python3.5/threading.py", line 862, in run
    self._target(*self._args, **self._kwargs)
  File "/opt/conda/lib/python3.5/site-packages/scoop/_comm/scoopzmq.py", line 196, in _reportFutures
    pickle.dumps(fids),
  File "/opt/conda/lib/python3.5/site-packages/zmq/sugar/socket.py", line 366, in send_multipart
    self.send(msg, SNDMORE|flags, copy=copy, track=track)
  File "zmq/backend/cython/socket.pyx", line 636, in zmq.backend.cython.socket.Socket.send (zmq/backend/cython/socket.c:7305)
  File "zmq/backend/cython/socket.pyx", line 673, in zmq.backend.cython.socket.Socket.send (zmq/backend/cython/socket.c:6956)
  File "zmq/backend/cython/socket.pyx", line 94, in zmq.backend.cython.socket._check_closed (zmq/backend/cython/socket.c:1742)
zmq.error.ZMQError: Socket operation on non-socket

............Exception in thread Thread-16:
Traceback (most recent call last):
  File "/opt/conda/lib/python3.5/threading.py", line 914, in _bootstrap_inner
    self.run()
  File "/opt/conda/lib/python3.5/threading.py", line 862, in run
    self._target(*self._args, **self._kwargs)
  File "/opt/conda/lib/python3.5/site-packages/scoop/_comm/scoopzmq.py", line 196, in _reportFutures
    pickle.dumps(fids),
  File "/opt/conda/lib/python3.5/site-packages/zmq/sugar/socket.py", line 366, in send_multipart
    self.send(msg, SNDMORE|flags, copy=copy, track=track)
  File "zmq/backend/cython/socket.pyx", line 636, in zmq.backend.cython.socket.Socket.send (zmq/backend/cython/socket.c:7305)
  File "zmq/backend/cython/socket.pyx", line 673, in zmq.backend.cython.socket.Socket.send (zmq/backend/cython/socket.c:6956)
  File "zmq/backend/cython/socket.pyx", line 94, in zmq.backend.cython.socket._check_closed (zmq/backend/cython/socket.c:1742)
zmq.error.ZMQError: Socket operation on non-socket

python -m scoop tests.py
[2016-12-22 17:50:40,860] launcher  INFO    SCOOP 0.7 2.0 on linux using Python 3.5.2 |Continuum Analytics, Inc.| (default, Jul  2 2016, 17:53:06) [GCC 4.4.7 20120313 (Red Hat 4.4.7-1)], API: 1013
[2016-12-22 17:50:40,860] launcher  INFO    Deploying 8 worker(s) over 1 host(s).
[2016-12-22 17:50:40,860] launcher  INFO    Worker d--istribution: 
[2016-12-22 17:50:40,860] launcher  INFO       127.0.0.1:   7 + origin
Launching 8 worker(s) using /bin/bash.
............[Errno 2] No such file or directory: 'tests.py'
File: tests.py
In path: /home/jovyan
........Exception in thread Thread-4:
Traceback (most recent call last):
  File "/opt/conda/lib/python3.5/threading.py", line 914, in _bootstrap_inner
    self.run()
  File "/opt/conda/lib/python3.5/threading.py", line 862, in run
    self._target(*self._args, **self._kwargs)
  File "/opt/conda/lib/python3.5/site-packages/scoop/_comm/scoopzmq.py", line 196, in _reportFutures
    pickle.dumps(fids),
  File "/opt/conda/lib/python3.5/site-packages/zmq/sugar/socket.py", line 366, in send_multipart
    self.send(msg, SNDMORE|flags, copy=copy, track=track)
  File "zmq/backend/cython/socket.pyx", line 636, in zmq.backend.cython.socket.Socket.send (zmq/backend/cython/socket.c:7305)
  File "zmq/backend/cython/socket.pyx", line 673, in zmq.backend.cython.socket.Socket.send (zmq/backend/cython/socket.c:6956)
  File "zmq/backend/cython/socket.pyx", line 94, in zmq.backend.cython.socket._check_closed (zmq/backend/cython/socket.c:1742)
zmq.error.ZMQError: Socket operation on non-socket

.............Exception in thread Thread-19:
Traceback (most recent call last):
  File "/opt/conda/lib/python3.5/threading.py", line 914, in _bootstrap_inner
    self.run()
  File "/opt/conda/lib/python3.5/threading.py", line 862, in run
    self._target(*self._args, **self._kwargs)
  File "/opt/conda/lib/python3.5/site-packages/scoop/_comm/scoopzmq.py", line 196, in _reportFutures
    pickle.dumps(fids),
  File "/opt/conda/lib/python3.5/site-packages/zmq/sugar/socket.py", line 366, in send_multipart
    self.send(msg, SNDMORE|flags, copy=copy, track=track)
  File "zmq/backend/cython/socket.pyx", line 636, in zmq.backend.cython.socket.Socket.send (zmq/backend/cython/socket.c:7305)
  File "zmq/backend/cython/socket.pyx", line 673, in zmq.backend.cython.socket.Socket.send (zmq/backend/cython/socket.c:6956)
  File "zmq/backend/cython/socket.pyx", line 94, in zmq.backend.cython.socket._check_closed (zmq/backend/cython/socket.c:1742)
zmq.error.ZMQError: Socket operation on non-socket

.................................................EEEEE
======================================================================
ERROR: test_parseSLURM_dashOneDecimal (tests_parser.TestUtils)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/jovyan/git/scoop/test/tests_parser.py", line 78, in test_parseSLURM_dashOneDecimal
    hosts = utils.parseSLURM("n[1-4]")
  File "/opt/conda/lib/python3.5/site-packages/scoop/utils.py", line 208, in parseSLURM
    hostsstr = subprocess.check_output(["scontrol", "show", "hostnames", string])
  File "/opt/conda/lib/python3.5/subprocess.py", line 626, in check_output
    **kwargs).stdout
  File "/opt/conda/lib/python3.5/subprocess.py", line 693, in run
    with Popen(*popenargs, **kwargs) as process:
  File "/opt/conda/lib/python3.5/subprocess.py", line 947, in __init__
    restore_signals, start_new_session)
  File "/opt/conda/lib/python3.5/subprocess.py", line 1551, in _execute_child
    raise child_exception_type(errno_num, err_msg)
FileNotFoundError: [Errno 2] No such file or directory: 'scontrol'

======================================================================
ERROR: test_parseSLURM_dashTwoDecimals (tests_parser.TestUtils)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/jovyan/git/scoop/test/tests_parser.py", line 83, in test_parseSLURM_dashTwoDecimals
    hosts = utils.parseSLURM("n[5-10]")
  File "/opt/conda/lib/python3.5/site-packages/scoop/utils.py", line 208, in parseSLURM
    hostsstr = subprocess.check_output(["scontrol", "show", "hostnames", string])
  File "/opt/conda/lib/python3.5/subprocess.py", line 626, in check_output
    **kwargs).stdout
  File "/opt/conda/lib/python3.5/subprocess.py", line 693, in run
    with Popen(*popenargs, **kwargs) as process:
  File "/opt/conda/lib/python3.5/subprocess.py", line 947, in __init__
    restore_signals, start_new_session)
  File "/opt/conda/lib/python3.5/subprocess.py", line 1551, in _execute_child
    raise child_exception_type(errno_num, err_msg)
FileNotFoundError: [Errno 2] No such file or directory: 'scontrol'

======================================================================
ERROR: test_parseSLURM_dashTwonames (tests_parser.TestUtils)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/jovyan/git/scoop/test/tests_parser.py", line 88, in test_parseSLURM_dashTwonames
    hosts = utils.parseSLURM("x[1-2]y[1-2]")
  File "/opt/conda/lib/python3.5/site-packages/scoop/utils.py", line 208, in parseSLURM
    hostsstr = subprocess.check_output(["scontrol", "show", "hostnames", string])
  File "/opt/conda/lib/python3.5/subprocess.py", line 626, in check_output
    **kwargs).stdout
  File "/opt/conda/lib/python3.5/subprocess.py", line 693, in run
    with Popen(*popenargs, **kwargs) as process:
  File "/opt/conda/lib/python3.5/subprocess.py", line 947, in __init__
    restore_signals, start_new_session)
  File "/opt/conda/lib/python3.5/subprocess.py", line 1551, in _execute_child
    raise child_exception_type(errno_num, err_msg)
FileNotFoundError: [Errno 2] No such file or directory: 'scontrol'

======================================================================
ERROR: test_parseSLURM_nondashOneDecimal (tests_parser.TestUtils)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/jovyan/git/scoop/test/tests_parser.py", line 96, in test_parseSLURM_nondashOneDecimal
    hosts = utils.parseSLURM("n[1,4]")
  File "/opt/conda/lib/python3.5/site-packages/scoop/utils.py", line 208, in parseSLURM
    hostsstr = subprocess.check_output(["scontrol", "show", "hostnames", string])
  File "/opt/conda/lib/python3.5/subprocess.py", line 626, in check_output
    **kwargs).stdout
  File "/opt/conda/lib/python3.5/subprocess.py", line 693, in run
    with Popen(*popenargs, **kwargs) as process:
  File "/opt/conda/lib/python3.5/subprocess.py", line 947, in __init__
    restore_signals, start_new_session)
  File "/opt/conda/lib/python3.5/subprocess.py", line 1551, in _execute_child
    raise child_exception_type(errno_num, err_msg)
FileNotFoundError: [Errno 2] No such file or directory: 'scontrol'

======================================================================
ERROR: test_parseSLURM_nondash_and_dashOneDecimal (tests_parser.TestUtils)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/jovyan/git/scoop/test/tests_parser.py", line 102, in test_parseSLURM_nondash_and_dashOneDecimal
    hosts = utils.parseSLURM("n[1,5-9]")
  File "/opt/conda/lib/python3.5/site-packages/scoop/utils.py", line 208, in parseSLURM
    hostsstr = subprocess.check_output(["scontrol", "show", "hostnames", string])
  File "/opt/conda/lib/python3.5/subprocess.py", line 626, in check_output
    **kwargs).stdout
  File "/opt/conda/lib/python3.5/subprocess.py", line 693, in run
    with Popen(*popenargs, **kwargs) as process:
  File "/opt/conda/lib/python3.5/subprocess.py", line 947, in __init__
    restore_signals, start_new_session)
  File "/opt/conda/lib/python3.5/subprocess.py", line 1551, in _execute_child
    raise child_exception_type(errno_num, err_msg)
FileNotFoundError: [Errno 2] No such file or directory: 'scontrol'

----------------------------------------------------------------------
Ran 87 tests in 85.123s

FAILED (errors=5)
Traceback (most recent call last):
  File "/opt/conda/lib/python3.5/site-packages/scoop/bootstrap/__main__.py", line 285, in run
    futures_startup()
  File "/opt/conda/lib/python3.5/site-packages/scoop/bootstrap/__main__.py", line 266, in futures_startup
    run_name="__main__"
  File "/opt/conda/lib/python3.5/site-packages/scoop/futures.py", line 65, in _startup
    result = _controller.switch(rootFuture, *args, **kargs)
  File "/opt/conda/lib/python3.5/site-packages/scoop/_control.py", line 237, in runController
    future = future._switch(future)
  File "/opt/conda/lib/python3.5/site-packages/scoop/_types.py", line 132, in _switch
    return self.greenlet.switch(future)
  File "/opt/conda/lib/python3.5/site-packages/scoop/_control.py", line 185, in runFuture
    future._delete()
  File "/opt/conda/lib/python3.5/site-packages/scoop/_types.py", line 249, in _delete
    scoop._control.execQueue.inprogress.discard(self.id)
AttributeError: module 'scoop._control' has no attribute 'execQueue'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/conda/lib/python3.5/runpy.py", line 184, in _run_module_as_main
    "__main__", mod_spec)
  File "/opt/conda/lib/python3.5/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/opt/conda/lib/python3.5/site-packages/scoop/bootstrap/__main__.py", line 298, in <module>
    b.main()
  File "/opt/conda/lib/python3.5/site-packages/scoop/bootstrap/__main__.py", line 92, in main
    self.run()
  File "/opt/conda/lib/python3.5/site-packages/scoop/bootstrap/__main__.py", line 292, in run
    if scoop._control.execQueue:
AttributeError: module 'scoop._control' has no attribute 'execQueue'
[2016-12-22 17:52:06,434] launcher  (127.0.0.1:46028) INFO    Root process is done.
[2016-12-22 17:52:06,434] launcher  (127.0.0.1:46028) INFO    Finished cleaning spawned subprocesses.
fx-kirin commented 7 years ago

All I need is I got to define --external-hostname as ip address or PC name.

If you want to use scoop in docker container, you need to set `--tunnel`` option.