Closed andre-merzky closed 10 years ago
Hi Andre,
Thanks for reporting. I didnt test local execution for a while, will look into it. Got some refactoring of the bootstrapper going on and will combine that.
Gr,
Mark
On 17 Jun 2014, at 7:38 , Andre Merzky notifications@github.com wrote:
The agent log (and ERR) contains:
2014-06-17 13:31:59,513 - radical.pilot.agent - INFO - Launching task 53a0271f74df921aa4e29b46 via cd /home/merzky/radical.pilot.sandbox/pilot-53a0271e74df921aa4e29b44/unit-53a0271f74df921aa4e29b46 && /bin/echo I am an $CU_SET CU with id $CU_NO (env: {u'CU_SET': u'A', u'CU_NO': 0}) in /home/merzky/radical.pilot.sandbox/pilot-53a0271e74df921aa4e29b44/unit-53a0271f74df921aa4e29b46 2014-06-17 13:31:59,517 - radical.pilot.agent - ERROR - Error in ExecWorker loop: Traceback (most recent call last): File "radical-pilot-agent.py", line 382, in run logger=self._log) File "radical-pilot-agent.py", line 783, in init creationflags=0) File "/usr/lib/python2.7/subprocess.py", line 710, in init errread, errwrite) File "/usr/lib/python2.7/subprocess.py", line 1327, in _execute_child raise child_exception TypeError: execve() arg 3 contains a non-string value
2014-06-17 13:31:59,773 - radical.pilot.agent - ERROR - Execution worker <ExecWorker(ExecWorker-1, stopped[1] daemon)> died.
A pilot state callback is not getting invoked, so my application simply hangs forever. The CU seems to marked as Failed though.
The pilot and CU descriptions are:
HOSTNAME = "localhost" # Remote Resource URL USERNAME = "merzky" # Username on the remote resource QUEUE = None # Add queue you want to use PROJECT = None # Add project / allocation / account to charge WALLTIME = 10 # Add pilot wallsime in minutes PILOT_SIZE = 1 # Number of cores required for the Pilot-Job NUMBER_JOBS = 10 # The TOTAL number of cus to run
...
pdesc = radical.pilot.ComputePilotDescription () pdesc.resource = HOSTNAME pdesc.runtime = WALLTIME pdesc.queue = QUEUE pdesc.project = PROJECT pdesc.cores = PILOT_SIZE pdesc.cleanup = True
...
cudesc = radical.pilot.ComputeUnitDescription() cudesc.environment = {"CU_SET": "A", "CU_NO": i} cudesc.executable = "/bin/echo" cudesc.arguments = ['I am an $CU_SET CU with id $CU_NO', ] cudesc.cores = 1
The complete example is under examples/tutorial/ensemble_chained.py.
— Reply to this email directly or view it on GitHub.
Hi, I have run all 'local' examples, as well as the unit test (which all use 'localhost') and they all pass without any problems...
The chaining example also works fine with 'localhost' and the 'master' branch:
Finished processing dependencies for radical.pilot==0.15
2014:06:18 00:11:58 radical.pilot.MainProcess: [INFO ] radical.pilot version: 0.15
2014:06:18 00:12:01 radical.pilot.MainProcess: [INFO ] New Session created{'database_url': 'mongodb://ec2-184-72-89-141.compute-1.amazonaws.com:27017/', 'database_name': 'radicalpilot', 'last_reconnect': None, 'uid': '53a0bd2e0941de3b1c491fd5', 'created': datetime.datetime(2014, 6, 17, 22, 11, 58, 126010)}.
2014:06:18 00:12:01 radical.pilot.MainProcess: [INFO ] Added credential {'user_id': 'oweidner', 'type': 'SSH', 'user_pass': None, 'user_key': None} to session 53a0bd2e0941de3b1c491fd5.
create pilot manager
2014:06:18 00:12:01 radical.pilot.MainProcess: [INFO ] Loaded resource configurations from file://localhost//private/tmp/RPP/lib/python2.7/site-packages/radical.pilot-0.15-py2.7.egg/radical/pilot/configs/archer.json
2014:06:18 00:12:01 radical.pilot.MainProcess: [INFO ] Loaded resource configurations from file://localhost//private/tmp/RPP/lib/python2.7/site-packages/radical.pilot-0.15-py2.7.egg/radical/pilot/configs/das4.json
2014:06:18 00:12:01 radical.pilot.MainProcess: [INFO ] Loaded resource configurations from file://localhost//private/tmp/RPP/lib/python2.7/site-packages/radical.pilot-0.15-py2.7.egg/radical/pilot/configs/futuregrid.json
2014:06:18 00:12:01 radical.pilot.MainProcess: [INFO ] Loaded resource configurations from file://localhost//private/tmp/RPP/lib/python2.7/site-packages/radical.pilot-0.15-py2.7.egg/radical/pilot/configs/localhost.json
2014:06:18 00:12:01 radical.pilot.MainProcess: [INFO ] Loaded resource configurations from file://localhost//private/tmp/RPP/lib/python2.7/site-packages/radical.pilot-0.15-py2.7.egg/radical/pilot/configs/ncar.json
2014:06:18 00:12:02 radical.pilot.MainProcess: [INFO ] Loaded resource configurations from file://localhost//private/tmp/RPP/lib/python2.7/site-packages/radical.pilot-0.15-py2.7.egg/radical/pilot/configs/xsede.json
2014:06:18 00:12:02 radical.pilot.MainProcess: [DEBUG ] Worker thread (ID: Thread-1[4538777600]) for PilotManager 53a0bd320941de3b1c491fd6 started.
submit pilot
create unit manager
2014:06:18 00:12:03 radical.pilot.PilotLauncherWorker-1: [DEBUG ] Connected to MongoDB. Serving requests for PilotManager 53a0bd320941de3b1c491fd6.
2014:06:18 00:12:03 radical.pilot.MainProcess: [DEBUG ] Worker thread (ID: Thread-2[4544036864]) for UnitManager 53a0bd330941de3b1c491fd8 started.
2014:06:18 00:12:03 radical.pilot.MainProcess: [INFO ] Loaded scheduler: DirectSubmissionScheduler.
add pilot
submit units A
2014:06:18 00:12:03 radical.pilot.PilotLauncherWorker-1: [DEBUG ] Found SSH context info: {'Type' : 'SSH', 'UserID' : 'oweidner', 'UserKey' : 'None', 'UserPass' : 'None'}.
2014:06:18 00:12:03 radical.pilot.InputFileTransferWorker-1: [DEBUG ] Connected to MongoDB. Serving requests for UnitManager 53a0bd330941de3b1c491fd8.
2014:06:18 00:12:03 radical.pilot.InputFileTransferWorker-2: [DEBUG ] Connected to MongoDB. Serving requests for UnitManager 53a0bd330941de3b1c491fd8.
2014:06:18 00:12:03 radical.pilot.OutputFileTransferWorker-1: [DEBUG ] Connected to MongoDB. Serving requests for UnitManager 53a0bd330941de3b1c491fd8.2014:06:18 00:12:03 radical.pilot.OutputFileTransferWorker-2: [DEBUG ] Connected to MongoDB. Serving requests for UnitManager 53a0bd330941de3b1c491fd8.
2014:06:18 00:12:03 radical.pilot.PilotLauncherWorker-1: [INFO ] Launching ComputePilot {u'state': u'PendingLaunch', u'description': {u'project': None, u'resource': u'localhost', u'queue': None, u'sandbox': None, u'cleanup': True, u'pilot_agent_priv': None, u'cores': 1, u'runtime': 10}, u'sagajobid': None, u'started': None, u'cores_per_node': None, u'output_transfer_started': None, u'sandbox': u'file://localhost/Users/oweidner/radical.pilot.sandbox/pilot-53a0bd320941de3b1c491fd7/', u'submitted': datetime.datetime(2014, 6, 17, 22, 12, 2, 914000), u'unitmanager': u'53a0bd330941de3b1c491fd8', u'output_transfer_finished': None, u'finished': None, u'pilotmanager': u'53a0bd320941de3b1c491fd6', u'command': None, u'wu_queue': [], u'statehistory': [{u'timestamp': datetime.datetime(2014, 6, 17, 22, 12, 2, 913000), u'state': u'PendingLaunch'}], u'input_transfer_started': None, u'_id': ObjectId('53a0bd320941de3b1c491fd7'), u'input_transfer_finished': None, u'nodes': None, u'log': []}
2014:06:18 00:12:03 radical.pilot.PilotLauncherWorker-1: [INFO ] Using pilot agent /private/tmp/RPP/lib/python2.7/site-packages/radical.pilot-0.15-py2.7.egg/radical/pilot/agent/radical-pilot-agent-singlecore.py
2014:06:18 00:12:03 radical.pilot.PilotLauncherWorker-1: [INFO ] Using bootstrapper /private/tmp/RPP/lib/python2.7/site-packages/radical.pilot-0.15-py2.7.egg/radical/pilot/bootstrapper/default_bootstrapper.sh
2014:06:18 00:12:03 radical.pilot.PilotLauncherWorker-1: [DEBUG ] Creating agent sandbox 'file://localhost/Users/oweidner/radical.pilot.sandbox/pilot-53a0bd320941de3b1c491fd7/'.
2014:06:18 00:12:03 radical.pilot.MainProcess: [INFO ] ComputePilot '53a0bd320941de3b1c491fd7' state changed from 'PendingLaunch' to 'Launching'.
2014:06:18 00:12:03 radical.pilot.InputFileTransferWorker-1: [DEBUG ] Found SSH context info: {'Type' : 'SSH', 'UserID' : 'oweidner', 'UserKey' : 'None', 'UserPass' : 'None'}.
2014:06:18 00:12:03 radical.pilot.InputFileTransferWorker-2: [DEBUG ] Found SSH context info: {'Type' : 'SSH', 'UserID' : 'oweidner', 'UserKey' : 'None', 'UserPass' : 'None'}.
2014:06:18 00:12:03 radical.pilot.OutputFileTransferWorker-2: [DEBUG ] Found SSH context info: {'Type' : 'SSH', 'UserID' : 'oweidner', 'UserKey' : 'None', 'UserPass' : 'None'}.
2014:06:18 00:12:03 radical.pilot.OutputFileTransferWorker-1: [DEBUG ] Found SSH context info: {'Type' : 'SSH', 'UserID' : 'oweidner', 'UserKey' : 'None', 'UserPass' : 'None'}.
2014:06:18 00:12:03 radical.pilot.PilotLauncherWorker-1: [DEBUG ] Copying bootstrapper 'file://localhost//private/tmp/RPP/lib/python2.7/site-packages/radical.pilot-0.15-py2.7.egg/radical/pilot/bootstrapper/default_bootstrapper.sh' to agent sandbox (file://localhost/Users/oweidner/radical.pilot.sandbox/pilot-53a0bd320941de3b1c491fd7/).
2014:06:18 00:12:04 radical.pilot.PilotLauncherWorker-1: [DEBUG ] Copying agent 'file://localhost//private/tmp/RPP/lib/python2.7/site-packages/radical.pilot-0.15-py2.7.egg/radical/pilot/agent/radical-pilot-agent-singlecore.py' to agent sandbox (file://localhost/Users/oweidner/radical.pilot.sandbox/pilot-53a0bd320941de3b1c491fd7/).
2014:06:18 00:12:04 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '53a0bd330941de3b1c491fd9' state changed from 'New' to 'PendingExecution'.
2014:06:18 00:12:04 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '53a0bd330941de3b1c491fda' state changed from 'New' to 'PendingExecution'.
2014:06:18 00:12:04 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '53a0bd330941de3b1c491fdb' state changed from 'New' to 'PendingExecution'.
2014:06:18 00:12:05 radical.pilot.MainProcess: [INFO ] Scheduled ComputeUnits ['53a0bd330941de3b1c491fd9', '53a0bd330941de3b1c491fda', '53a0bd330941de3b1c491fdb', '53a0bd330941de3b1c491fdc', '53a0bd330941de3b1c491fdd', '53a0bd330941de3b1c491fde', '53a0bd330941de3b1c491fdf', '53a0bd330941de3b1c491fe0', '53a0bd330941de3b1c491fe1', '53a0bd330941de3b1c491fe2'] for execution on ComputePilot '53a0bd320941de3b1c491fd7'.
2014:06:18 00:12:05 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '53a0bd330941de3b1c491fdc' state changed from 'New' to 'PendingExecution'.
2014:06:18 00:12:05 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '53a0bd330941de3b1c491fdd' state changed from 'New' to 'PendingExecution'.
2014:06:18 00:12:05 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '53a0bd330941de3b1c491fde' state changed from 'New' to 'PendingExecution'.
2014:06:18 00:12:05 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '53a0bd330941de3b1c491fdf' state changed from 'New' to 'PendingExecution'.
2014:06:18 00:12:05 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '53a0bd330941de3b1c491fe0' state changed from 'New' to 'PendingExecution'.
2014:06:18 00:12:05 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '53a0bd330941de3b1c491fe1' state changed from 'New' to 'PendingExecution'.
2014:06:18 00:12:05 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '53a0bd330941de3b1c491fe2' state changed from 'New' to 'PendingExecution'.
2014:06:18 00:12:05 radical.pilot.PilotLauncherWorker-1: [DEBUG ] Bootstrap command line: /bin/bash ['-l', '-c', '"./default_bootstrapper.sh -r ec2-184-72-89-141.compute-1.amazonaws.com:27017/ -d radicalpilot -s 53a0bd2e0941de3b1c491fd5 -p 53a0bd320941de3b1c491fd7 -t 10 -c 1 -V 0.15 -l LOCAL -C "']
2014:06:18 00:12:05 radical.pilot.PilotLauncherWorker-1: [DEBUG ] Submitting SAGA job with description: <class 'saga.job.description.Description'> {'Executable': '/bin/bash', 'WorkingDirectory': '/Users/oweidner/radical.pilot.sandbox/pilot-53a0bd320941de3b1c491fd7', 'WallTimeLimit': 10, 'Arguments': ['-l', '-c', '"./default_bootstrapper.sh -r ec2-184-72-89-141.compute-1.amazonaws.com:27017/ -d radicalpilot -s 53a0bd2e0941de3b1c491fd5 -p 53a0bd320941de3b1c491fd7 -t 10 -c 1 -V 0.15 -l LOCAL -C "'], 'Error': 'AGENT.STDERR', 'Output': 'AGENT.STDOUT', 'TotalCPUCount': 1}
2014:06:18 00:12:06 radical.pilot.PilotLauncherWorker-1: [DEBUG ] SAGA job submitted with job id [fork://localhost]-[15251]
2014:06:18 00:12:06 radical.pilot.MainProcess: [DEBUG ] Compute unit 53a0bd330941de3b1c491fd9 in state PendingExecution
2014:06:18 00:12:06 radical.pilot.MainProcess: [INFO ] ComputePilot '53a0bd320941de3b1c491fd7' state changed from 'Launching' to 'PendingActive'.
2014:06:18 00:12:07 radical.pilot.MainProcess: [DEBUG ] Compute unit 53a0bd330941de3b1c491fd9 in state PendingExecution
2014:06:18 00:12:08 radical.pilot.MainProcess: [DEBUG ] Compute unit 53a0bd330941de3b1c491fd9 in state PendingExecution
2014:06:18 00:12:09 radical.pilot.MainProcess: [DEBUG ] Compute unit 53a0bd330941de3b1c491fd9 in state PendingExecution
2014:06:18 00:12:10 radical.pilot.MainProcess: [DEBUG ] Compute unit 53a0bd330941de3b1c491fd9 in state PendingExecution
2014:06:18 00:12:11 radical.pilot.MainProcess: [DEBUG ] Compute unit 53a0bd330941de3b1c491fd9 in state PendingExecution
2014:06:18 00:12:12 radical.pilot.MainProcess: [DEBUG ] Compute unit 53a0bd330941de3b1c491fd9 in state PendingExecution
2014:06:18 00:12:13 radical.pilot.MainProcess: [DEBUG ] Compute unit 53a0bd330941de3b1c491fd9 in state PendingExecution
2014:06:18 00:12:14 radical.pilot.MainProcess: [DEBUG ] Compute unit 53a0bd330941de3b1c491fd9 in state PendingExecution
2014:06:18 00:12:14 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '53a0bd330941de3b1c491fd9' state changed from 'PendingExecution' to 'Executing'.
2014:06:18 00:12:14 radical.pilot.MainProcess: [INFO ] ComputePilot '53a0bd320941de3b1c491fd7' state changed from 'PendingActive' to 'Active'.
2014:06:18 00:12:15 radical.pilot.MainProcess: [DEBUG ] Compute unit 53a0bd330941de3b1c491fd9 in state Executing
2014:06:18 00:12:15 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '53a0bd330941de3b1c491fda' state changed from 'PendingExecution' to 'Executing'.
2014:06:18 00:12:16 radical.pilot.MainProcess: [DEBUG ] Compute unit 53a0bd330941de3b1c491fd9 in state Executing
2014:06:18 00:12:16 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '53a0bd330941de3b1c491fdb' state changed from 'PendingExecution' to 'Executing'.
2014:06:18 00:12:17 radical.pilot.MainProcess: [DEBUG ] Compute unit 53a0bd330941de3b1c491fd9 in state Executing
2014:06:18 00:12:17 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '53a0bd330941de3b1c491fd9' state changed from 'Executing' to 'Done'.
2014:06:18 00:12:17 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '53a0bd330941de3b1c491fdc' state changed from 'PendingExecution' to 'Executing'.
2014:06:18 00:12:18 radical.pilot.MainProcess: [DEBUG ] Compute unit 53a0bd330941de3b1c491fd9 in state Done
---------------
A CU '53a0bd330941de3b1c491fd9' finished. Submitting B CU.
2014:06:18 00:12:18 radical.pilot.MainProcess: [INFO ] Scheduled ComputeUnits ['53a0bd420941de3b1c491fe3'] for execution on ComputePilot '53a0bd320941de3b1c491fd7'.
---------------
B CU '9' '53a0bd420941de3b1c491fe3' submitted
2014:06:18 00:12:18 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '53a0bd330941de3b1c491fdd' state changed from 'PendingExecution' to 'Executing'.
2014:06:18 00:12:18 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '53a0bd420941de3b1c491fe3' state changed from 'New' to 'PendingExecution'.
2014:06:18 00:12:19 radical.pilot.MainProcess: [DEBUG ] Compute unit 53a0bd330941de3b1c491fdb in state Executing
2014:06:18 00:12:20 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '53a0bd330941de3b1c491fda' state changed from 'Executing' to 'Done'.
2014:06:18 00:12:20 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '53a0bd330941de3b1c491fde' state changed from 'PendingExecution' to 'Executing'.
2014:06:18 00:12:20 radical.pilot.MainProcess: [DEBUG ] Compute unit 53a0bd330941de3b1c491fdb in state Executing
2014:06:18 00:12:21 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '53a0bd330941de3b1c491fdf' state changed from 'PendingExecution' to 'Executing'.
2014:06:18 00:12:21 radical.pilot.MainProcess: [DEBUG ] Compute unit 53a0bd330941de3b1c491fdb in state Executing
2014:06:18 00:12:22 radical.pilot.MainProcess: [DEBUG ] Compute unit 53a0bd330941de3b1c491fdb in state Executing
2014:06:18 00:12:23 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '53a0bd330941de3b1c491fdb' state changed from 'Executing' to 'Done'.
2014:06:18 00:12:23 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '53a0bd330941de3b1c491fe0' state changed from 'PendingExecution' to 'Executing'.
2014:06:18 00:12:23 radical.pilot.MainProcess: [DEBUG ] Compute unit 53a0bd330941de3b1c491fdb in state Done
---------------
A CU '53a0bd330941de3b1c491fdb' finished. Submitting B CU.
2014:06:18 00:12:24 radical.pilot.MainProcess: [INFO ] Scheduled ComputeUnits ['53a0bd470941de3b1c491fe4'] for execution on ComputePilot '53a0bd320941de3b1c491fd7'.
---------------
B CU '9' '53a0bd470941de3b1c491fe4' submitted
2014:06:18 00:12:24 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '53a0bd330941de3b1c491fe1' state changed from 'PendingExecution' to 'Executing'.
2014:06:18 00:12:24 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '53a0bd470941de3b1c491fe4' state changed from 'New' to 'PendingExecution'.
2014:06:18 00:12:25 radical.pilot.MainProcess: [DEBUG ] Compute unit 53a0bd330941de3b1c491fdd in state Executing
2014:06:18 00:12:25 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '53a0bd330941de3b1c491fe2' state changed from 'PendingExecution' to 'Executing'.
2014:06:18 00:12:26 radical.pilot.MainProcess: [DEBUG ] Compute unit 53a0bd330941de3b1c491fdd in state Executing
2014:06:18 00:12:26 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '53a0bd330941de3b1c491fdc' state changed from 'Executing' to 'Done'.
2014:06:18 00:12:26 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '53a0bd420941de3b1c491fe3' state changed from 'PendingExecution' to 'Executing'.
2014:06:18 00:12:27 radical.pilot.MainProcess: [DEBUG ] Compute unit 53a0bd330941de3b1c491fdd in state Executing
2014:06:18 00:12:27 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '53a0bd470941de3b1c491fe4' state changed from 'PendingExecution' to 'Executing'.
2014:06:18 00:12:28 radical.pilot.MainProcess: [DEBUG ] Compute unit 53a0bd330941de3b1c491fdd in state Executing
2014:06:18 00:12:28 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '53a0bd330941de3b1c491fdd' state changed from 'Executing' to 'Done'.
2014:06:18 00:12:29 radical.pilot.MainProcess: [DEBUG ] Compute unit 53a0bd330941de3b1c491fdd in state Done
---------------
A CU '53a0bd330941de3b1c491fdd' finished. Submitting B CU.
2014:06:18 00:12:30 radical.pilot.MainProcess: [INFO ] Scheduled ComputeUnits ['53a0bd4d0941de3b1c491fe5'] for execution on ComputePilot '53a0bd320941de3b1c491fd7'.
---------------
B CU '9' '53a0bd4d0941de3b1c491fe5' submitted
2014:06:18 00:12:30 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '53a0bd4d0941de3b1c491fe5' state changed from 'New' to 'PendingExecution'.
2014:06:18 00:12:31 radical.pilot.MainProcess: [DEBUG ] Compute unit 53a0bd330941de3b1c491fdf in state Executing
2014:06:18 00:12:31 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '53a0bd4d0941de3b1c491fe5' state changed from 'PendingExecution' to 'Executing'.
2014:06:18 00:12:32 radical.pilot.MainProcess: [DEBUG ] Compute unit 53a0bd330941de3b1c491fdf in state Executing
2014:06:18 00:12:32 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '53a0bd330941de3b1c491fde' state changed from 'Executing' to 'Done'.
2014:06:18 00:12:33 radical.pilot.MainProcess: [DEBUG ] Compute unit 53a0bd330941de3b1c491fdf in state Executing
2014:06:18 00:12:34 radical.pilot.MainProcess: [DEBUG ] Compute unit 53a0bd330941de3b1c491fdf in state Executing
2014:06:18 00:12:34 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '53a0bd330941de3b1c491fdf' state changed from 'Executing' to 'Done'.
2014:06:18 00:12:35 radical.pilot.MainProcess: [DEBUG ] Compute unit 53a0bd330941de3b1c491fdf in state Done
---------------
A CU '53a0bd330941de3b1c491fdf' finished. Submitting B CU.
2014:06:18 00:12:35 radical.pilot.MainProcess: [INFO ] Scheduled ComputeUnits ['53a0bd530941de3b1c491fe6'] for execution on ComputePilot '53a0bd320941de3b1c491fd7'.
---------------
B CU '9' '53a0bd530941de3b1c491fe6' submitted
2014:06:18 00:12:36 radical.pilot.MainProcess: [DEBUG ] Compute unit 53a0bd330941de3b1c491fe1 in state Executing
2014:06:18 00:12:36 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '53a0bd530941de3b1c491fe6' state changed from 'New' to 'Executing'.
2014:06:18 00:12:37 radical.pilot.MainProcess: [DEBUG ] Compute unit 53a0bd330941de3b1c491fe1 in state Executing
2014:06:18 00:12:37 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '53a0bd330941de3b1c491fe0' state changed from 'Executing' to 'Done'.
2014:06:18 00:12:38 radical.pilot.MainProcess: [DEBUG ] Compute unit 53a0bd330941de3b1c491fe1 in state Executing
2014:06:18 00:12:39 radical.pilot.MainProcess: [DEBUG ] Compute unit 53a0bd330941de3b1c491fe1 in state Executing
2014:06:18 00:12:40 radical.pilot.MainProcess: [DEBUG ] Compute unit 53a0bd330941de3b1c491fe1 in state Executing
2014:06:18 00:12:41 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '53a0bd330941de3b1c491fe1' state changed from 'Executing' to 'Done'.
2014:06:18 00:12:41 radical.pilot.MainProcess: [DEBUG ] Compute unit 53a0bd330941de3b1c491fe1 in state Done
---------------
A CU '53a0bd330941de3b1c491fe1' finished. Submitting B CU.
2014:06:18 00:12:42 radical.pilot.MainProcess: [INFO ] Scheduled ComputeUnits ['53a0bd5a0941de3b1c491fe7'] for execution on ComputePilot '53a0bd320941de3b1c491fd7'.
---------------
B CU '9' '53a0bd5a0941de3b1c491fe7' submitted
---------------
A CU '53a0bd330941de3b1c491fda' finished. Submitting B CU.
2014:06:18 00:12:43 radical.pilot.MainProcess: [INFO ] Scheduled ComputeUnits ['53a0bd5a0941de3b1c491fe8'] for execution on ComputePilot '53a0bd320941de3b1c491fd7'.
---------------
B CU '9' '53a0bd5a0941de3b1c491fe8' submitted
---------------
A CU '53a0bd330941de3b1c491fde' finished. Submitting B CU.
2014:06:18 00:12:43 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '53a0bd330941de3b1c491fe2' state changed from 'Executing' to 'Done'.
2014:06:18 00:12:43 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '53a0bd5a0941de3b1c491fe7' state changed from 'New' to 'Executing'.
2014:06:18 00:12:43 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '53a0bd5a0941de3b1c491fe8' state changed from 'New' to 'PendingExecution'.
2014:06:18 00:12:43 radical.pilot.MainProcess: [INFO ] Scheduled ComputeUnits ['53a0bd5b0941de3b1c491fe9'] for execution on ComputePilot '53a0bd320941de3b1c491fd7'.
---------------
B CU '9' '53a0bd5b0941de3b1c491fe9' submitted
---------------
A CU '53a0bd330941de3b1c491fe2' finished. Submitting B CU.
2014:06:18 00:12:44 radical.pilot.MainProcess: [INFO ] Scheduled ComputeUnits ['53a0bd5c0941de3b1c491fea'] for execution on ComputePilot '53a0bd320941de3b1c491fd7'.
---------------
B CU '9' '53a0bd5c0941de3b1c491fea' submitted
---------------
A CU '53a0bd330941de3b1c491fdc' finished. Submitting B CU.
2014:06:18 00:12:44 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '53a0bd5a0941de3b1c491fe8' state changed from 'PendingExecution' to 'Executing'.
2014:06:18 00:12:44 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '53a0bd5b0941de3b1c491fe9' state changed from 'New' to 'PendingExecution'.
2014:06:18 00:12:44 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '53a0bd5c0941de3b1c491fea' state changed from 'New' to 'PendingExecution'.
2014:06:18 00:12:45 radical.pilot.MainProcess: [INFO ] Scheduled ComputeUnits ['53a0bd5c0941de3b1c491feb'] for execution on ComputePilot '53a0bd320941de3b1c491fd7'.
---------------
B CU '9' '53a0bd5c0941de3b1c491feb' submitted
---------------
A CU '53a0bd330941de3b1c491fe0' finished. Submitting B CU.
2014:06:18 00:12:45 radical.pilot.MainProcess: [INFO ] Scheduled ComputeUnits ['53a0bd5d0941de3b1c491fec'] for execution on ComputePilot '53a0bd320941de3b1c491fd7'.
---------------
B CU '9' '53a0bd5d0941de3b1c491fec' submitted
2014:06:18 00:12:45 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '53a0bd5c0941de3b1c491feb' state changed from 'New' to 'PendingExecution'.
2014:06:18 00:12:46 radical.pilot.MainProcess: [DEBUG ] Compute unit 53a0bd420941de3b1c491fe3 in state Executing
2014:06:18 00:12:47 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '53a0bd5b0941de3b1c491fe9' state changed from 'PendingExecution' to 'Executing'.
2014:06:18 00:12:47 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '53a0bd5d0941de3b1c491fec' state changed from 'New' to 'PendingExecution'.
2014:06:18 00:12:47 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '53a0bd420941de3b1c491fe3' state changed from 'Executing' to 'Failed'.
[Callback]: CU '53a0bd420941de3b1c491fe3' state changed to 'Failed'.
Log:
[u'Scheduled for execution on ComputePilot 53a0bd320941de3b1c491fd7.']
2014:06:18 00:12:47 radical.pilot.MainProcess: [DEBUG ] Compute unit 53a0bd420941de3b1c491fe3 in state Failed
---------------
B CU '53a0bd420941de3b1c491fe3' finished.
HOWEVER: the CUs fail with following error:
/bin/sh: -c: line 0: ` cd /Users/oweidner/radical.pilot.sandbox/pilot-53a0bd320941de3b1c491fd7/unit-53a0bd530941de3b1c491fe6 && /bin/echo $CU_SET CU with id $CU_NO (`cat $HOME/tmp/A-03.txt`) '
/bin/sh: -c: line 0: syntax error near unexpected token `('
This is probably because the fixes for proper escaping are only in the multicore agent. I’ll make the multicore agent also work with localhost and then we are back to one agent.
On 17 Jun 2014, at 18:16 , Ole Weidner notifications@github.com wrote:
HOWEVER: the CUs fail with following error:
/bin/sh: -c: line 0:
cd /Users/oweidner/radical.pilot.sandbox/pilot-53a0bd320941de3b1c491fd7/unit-53a0bd530941de3b1c491fe6 && /bin/echo $CU_SET CU with id $CU_NO (
cat $HOME/tmp/A-03.txt) ' /bin/sh: -c: line 0: syntax error near unexpected token
('— Reply to this email directly or view it on GitHub.
+1
On Jun 18, 2014, at 12:22 AM, Mark Santcroos notifications@github.com wrote:
This is probably because the fixes for proper escaping are only in the multicore agent. I’ll make the multicore agent also work with localhost and then we are back to one agent.
On 17 Jun 2014, at 18:16 , Ole Weidner notifications@github.com wrote:
HOWEVER: the CUs fail with following error:
/bin/sh: -c: line 0:
cd /Users/oweidner/radical.pilot.sandbox/pilot-53a0bd320941de3b1c491fd7/unit-53a0bd530941de3b1c491fe6 && /bin/echo $CU_SET CU with id $CU_NO (
cat $HOME/tmp/A-03.txt) ' /bin/sh: -c: line 0: syntax error near unexpected token
('— Reply to this email directly or view it on GitHub.
— Reply to this email directly or view it on GitHub.
I believe this is fixed.
The agent log (and ERR) contains:
A pilot state callback is not getting invoked, so my application simply hangs forever. The CU seems to marked as Failed though.
The pilot and CU descriptions are:
The complete example is under
examples/tutorial/ensemble_chained.py
.