What steps will reproduce the problem?
1. Run a mutli-node scoop run using using full domain names in the --hosts line
e.g.,
python -m scoop.__main__ --backend ZMQ -vv --hosts node1.default.domain
node2.default.domain -n 32 scoopCode.py
What is the expected output?
I would expect this command
python -m scoop.__main__ --backend ZMQ -vv --hosts node1.default.domain
node2.default.domain -n 32 scoopCode.py
to do the same this as this command
python -m scoop.__main__ --backend ZMQ -vv --hosts node1 node2 -n 32
scoopCode.py
What do you see instead?
using long host names I get the following error
ERROR:root:Error while launching SCOOP subprocesses:
ERROR:root:Traceback (most recent call last):
File "/pkg/suse11/python/scoop/0.7.2/lib/python2.7/site-packages/scoop-0.7.2.dev-py2.7.egg/scoop/launcher.py", line 469, in main
rootTaskExitCode = thisScoopApp.run()
File "/pkg/suse11/python/scoop/0.7.2/lib/python2.7/site-packages/scoop-0.7.2.dev-py2.7.egg/scoop/launcher.py", line 258, in run
backend=self.backend,
File "/pkg/suse11/python/scoop/0.7.2/lib/python2.7/site-packages/scoop-0.7.2.dev-py2.7.egg/scoop/launch/brokerLaunch.py", line 148, in __init__
"SSH process stderr:\n{stderr}".format(**locals()))
Exception: Could not successfully launch the remote broker.
Requested remote broker ports, received:
Port number decoding error:
need more than 1 value to unpack
SSH process stderr:
Connection to cl2n091.default.domain closed.
But it runs perfectly fine with only the sort host names
What version of the product are you using?
Python 2.7.5
Scoop version 0.7.2
On what operating system?
SUSE Linux 11
Please provide any additional information below.
I am actually try to run this on out SGI cluster (SGI customized SUSE11), it
uses PBS Pro as the scheduler. If I submit a job with how the hosts line, scoop
detects the hosts PBS has given the job correctly, but it provides the full
hostnames. If I submit a multinode interactive job and manually provide the
short names it works fine, but this is really not ideal as it should be able to
go through the batch system properly.
Original issue reported on code.google.com by david.wa...@qut.edu.au on 27 Aug 2014 at 10:22
Original issue reported on code.google.com by
david.wa...@qut.edu.au
on 27 Aug 2014 at 10:22