radical-cybertools / radical.pilot

RADICAL-Pilot
http://radical-cybertools.github.io/radical-pilot/index.html
Other
54 stars 23 forks source link

Agent dies on local CU startup #183

Closed andre-merzky closed 10 years ago

andre-merzky commented 10 years ago

The agent log (and ERR) contains:

2014-06-17 13:31:59,513 - radical.pilot.agent - INFO - Launching task 53a0271f74df921aa4e29b46 via  cd /home/merzky/radical.pilot.sandbox/pilot-53a0271e74df921aa4e29b44/unit-53a0271f74df921aa4e29b46 &&  /bin/echo  I am an $CU_SET CU with id $CU_NO  (env: {u'CU_SET': u'A', u'CU_NO': 0}) in /home/merzky/radical.pilot.sandbox/pilot-53a0271e74df921aa4e29b44/unit-53a0271f74df921aa4e29b46
2014-06-17 13:31:59,517 - radical.pilot.agent - ERROR - Error in ExecWorker loop: Traceback (most recent call last):
  File "radical-pilot-agent.py", line 382, in run
    logger=self._log)
  File "radical-pilot-agent.py", line 783, in __init__
    creationflags=0)
  File "/usr/lib/python2.7/subprocess.py", line 710, in __init__
    errread, errwrite)
  File "/usr/lib/python2.7/subprocess.py", line 1327, in _execute_child
    raise child_exception
TypeError: execve() arg 3 contains a non-string value

2014-06-17 13:31:59,773 - radical.pilot.agent - ERROR - Execution worker <ExecWorker(ExecWorker-1, stopped[1] daemon)> died.

A pilot state callback is not getting invoked, so my application simply hangs forever. The CU seems to marked as Failed though.

The pilot and CU descriptions are:

HOSTNAME     = "localhost" # Remote Resource URL
USERNAME     = "merzky"    # Username on the remote resource
QUEUE        =  None # Add queue you want to use
PROJECT      =  None # Add project / allocation / account to charge
WALLTIME     =    10 # Add pilot wallsime in minutes
PILOT_SIZE   =     1 # Number of cores required for the Pilot-Job
NUMBER_JOBS  =    10 # The TOTAL number of cus to run

...

        pdesc = radical.pilot.ComputePilotDescription ()
        pdesc.resource = HOSTNAME
        pdesc.runtime  = WALLTIME
        pdesc.queue    = QUEUE
        pdesc.project  = PROJECT
        pdesc.cores    = PILOT_SIZE
        pdesc.cleanup  = True

...

            cudesc = radical.pilot.ComputeUnitDescription()
            cudesc.environment = {"CU_SET": "A", "CU_NO": i}
            cudesc.executable  = "/bin/echo"
            cudesc.arguments   = ['I am an $CU_SET CU with id $CU_NO', ]
            cudesc.cores       = 1

The complete example is under examples/tutorial/ensemble_chained.py.

marksantcroos commented 10 years ago

Hi Andre,

Thanks for reporting. I didnt test local execution for a while, will look into it. Got some refactoring of the bootstrapper going on and will combine that.

Gr,

Mark

On 17 Jun 2014, at 7:38 , Andre Merzky notifications@github.com wrote:

The agent log (and ERR) contains:

2014-06-17 13:31:59,513 - radical.pilot.agent - INFO - Launching task 53a0271f74df921aa4e29b46 via cd /home/merzky/radical.pilot.sandbox/pilot-53a0271e74df921aa4e29b44/unit-53a0271f74df921aa4e29b46 && /bin/echo I am an $CU_SET CU with id $CU_NO (env: {u'CU_SET': u'A', u'CU_NO': 0}) in /home/merzky/radical.pilot.sandbox/pilot-53a0271e74df921aa4e29b44/unit-53a0271f74df921aa4e29b46 2014-06-17 13:31:59,517 - radical.pilot.agent - ERROR - Error in ExecWorker loop: Traceback (most recent call last): File "radical-pilot-agent.py", line 382, in run logger=self._log) File "radical-pilot-agent.py", line 783, in init creationflags=0) File "/usr/lib/python2.7/subprocess.py", line 710, in init errread, errwrite) File "/usr/lib/python2.7/subprocess.py", line 1327, in _execute_child raise child_exception TypeError: execve() arg 3 contains a non-string value

2014-06-17 13:31:59,773 - radical.pilot.agent - ERROR - Execution worker <ExecWorker(ExecWorker-1, stopped[1] daemon)> died.

A pilot state callback is not getting invoked, so my application simply hangs forever. The CU seems to marked as Failed though.

The pilot and CU descriptions are:

HOSTNAME = "localhost" # Remote Resource URL USERNAME = "merzky" # Username on the remote resource QUEUE = None # Add queue you want to use PROJECT = None # Add project / allocation / account to charge WALLTIME = 10 # Add pilot wallsime in minutes PILOT_SIZE = 1 # Number of cores required for the Pilot-Job NUMBER_JOBS = 10 # The TOTAL number of cus to run

...

    pdesc = radical.pilot.ComputePilotDescription ()
    pdesc.resource = HOSTNAME
    pdesc.runtime  = WALLTIME
    pdesc.queue    = QUEUE
    pdesc.project  = PROJECT
    pdesc.cores    = PILOT_SIZE
    pdesc.cleanup  = True

...

        cudesc = radical.pilot.ComputeUnitDescription()
        cudesc.environment = {"CU_SET": "A", "CU_NO": i}
        cudesc.executable  = "/bin/echo"
        cudesc.arguments   = ['I am an $CU_SET CU with id $CU_NO', ]
        cudesc.cores       = 1

The complete example is under examples/tutorial/ensemble_chained.py.

— Reply to this email directly or view it on GitHub.

oleweidner commented 10 years ago

Hi, I have run all 'local' examples, as well as the unit test (which all use 'localhost') and they all pass without any problems...

oleweidner commented 10 years ago

The chaining example also works fine with 'localhost' and the 'master' branch:

Finished processing dependencies for radical.pilot==0.15
2014:06:18 00:11:58 radical.pilot.MainProcess: [INFO    ] radical.pilot version: 0.15
2014:06:18 00:12:01 radical.pilot.MainProcess: [INFO    ] New Session created{'database_url': 'mongodb://ec2-184-72-89-141.compute-1.amazonaws.com:27017/', 'database_name': 'radicalpilot', 'last_reconnect': None, 'uid': '53a0bd2e0941de3b1c491fd5', 'created': datetime.datetime(2014, 6, 17, 22, 11, 58, 126010)}.
2014:06:18 00:12:01 radical.pilot.MainProcess: [INFO    ] Added credential {'user_id': 'oweidner', 'type': 'SSH', 'user_pass': None, 'user_key': None} to session 53a0bd2e0941de3b1c491fd5.
create pilot manager
2014:06:18 00:12:01 radical.pilot.MainProcess: [INFO    ] Loaded resource configurations from file://localhost//private/tmp/RPP/lib/python2.7/site-packages/radical.pilot-0.15-py2.7.egg/radical/pilot/configs/archer.json
2014:06:18 00:12:01 radical.pilot.MainProcess: [INFO    ] Loaded resource configurations from file://localhost//private/tmp/RPP/lib/python2.7/site-packages/radical.pilot-0.15-py2.7.egg/radical/pilot/configs/das4.json
2014:06:18 00:12:01 radical.pilot.MainProcess: [INFO    ] Loaded resource configurations from file://localhost//private/tmp/RPP/lib/python2.7/site-packages/radical.pilot-0.15-py2.7.egg/radical/pilot/configs/futuregrid.json
2014:06:18 00:12:01 radical.pilot.MainProcess: [INFO    ] Loaded resource configurations from file://localhost//private/tmp/RPP/lib/python2.7/site-packages/radical.pilot-0.15-py2.7.egg/radical/pilot/configs/localhost.json
2014:06:18 00:12:01 radical.pilot.MainProcess: [INFO    ] Loaded resource configurations from file://localhost//private/tmp/RPP/lib/python2.7/site-packages/radical.pilot-0.15-py2.7.egg/radical/pilot/configs/ncar.json
2014:06:18 00:12:02 radical.pilot.MainProcess: [INFO    ] Loaded resource configurations from file://localhost//private/tmp/RPP/lib/python2.7/site-packages/radical.pilot-0.15-py2.7.egg/radical/pilot/configs/xsede.json
2014:06:18 00:12:02 radical.pilot.MainProcess: [DEBUG   ] Worker thread (ID: Thread-1[4538777600]) for PilotManager 53a0bd320941de3b1c491fd6 started.
submit pilot
create unit manager
2014:06:18 00:12:03 radical.pilot.PilotLauncherWorker-1: [DEBUG   ] Connected to MongoDB. Serving requests for PilotManager 53a0bd320941de3b1c491fd6.
2014:06:18 00:12:03 radical.pilot.MainProcess: [DEBUG   ] Worker thread (ID: Thread-2[4544036864]) for UnitManager 53a0bd330941de3b1c491fd8 started.
2014:06:18 00:12:03 radical.pilot.MainProcess: [INFO    ] Loaded scheduler: DirectSubmissionScheduler.
add    pilot
submit units A
2014:06:18 00:12:03 radical.pilot.PilotLauncherWorker-1: [DEBUG   ] Found SSH context info: {'Type' : 'SSH', 'UserID' : 'oweidner', 'UserKey' : 'None', 'UserPass' : 'None'}.
2014:06:18 00:12:03 radical.pilot.InputFileTransferWorker-1: [DEBUG   ] Connected to MongoDB. Serving requests for UnitManager 53a0bd330941de3b1c491fd8.
2014:06:18 00:12:03 radical.pilot.InputFileTransferWorker-2: [DEBUG   ] Connected to MongoDB. Serving requests for UnitManager 53a0bd330941de3b1c491fd8.
2014:06:18 00:12:03 radical.pilot.OutputFileTransferWorker-1: [DEBUG   ] Connected to MongoDB. Serving requests for UnitManager 53a0bd330941de3b1c491fd8.2014:06:18 00:12:03 radical.pilot.OutputFileTransferWorker-2: [DEBUG   ] Connected to MongoDB. Serving requests for UnitManager 53a0bd330941de3b1c491fd8.

2014:06:18 00:12:03 radical.pilot.PilotLauncherWorker-1: [INFO    ] Launching ComputePilot {u'state': u'PendingLaunch', u'description': {u'project': None, u'resource': u'localhost', u'queue': None, u'sandbox': None, u'cleanup': True, u'pilot_agent_priv': None, u'cores': 1, u'runtime': 10}, u'sagajobid': None, u'started': None, u'cores_per_node': None, u'output_transfer_started': None, u'sandbox': u'file://localhost/Users/oweidner/radical.pilot.sandbox/pilot-53a0bd320941de3b1c491fd7/', u'submitted': datetime.datetime(2014, 6, 17, 22, 12, 2, 914000), u'unitmanager': u'53a0bd330941de3b1c491fd8', u'output_transfer_finished': None, u'finished': None, u'pilotmanager': u'53a0bd320941de3b1c491fd6', u'command': None, u'wu_queue': [], u'statehistory': [{u'timestamp': datetime.datetime(2014, 6, 17, 22, 12, 2, 913000), u'state': u'PendingLaunch'}], u'input_transfer_started': None, u'_id': ObjectId('53a0bd320941de3b1c491fd7'), u'input_transfer_finished': None, u'nodes': None, u'log': []}
2014:06:18 00:12:03 radical.pilot.PilotLauncherWorker-1: [INFO    ] Using pilot agent /private/tmp/RPP/lib/python2.7/site-packages/radical.pilot-0.15-py2.7.egg/radical/pilot/agent/radical-pilot-agent-singlecore.py
2014:06:18 00:12:03 radical.pilot.PilotLauncherWorker-1: [INFO    ] Using bootstrapper /private/tmp/RPP/lib/python2.7/site-packages/radical.pilot-0.15-py2.7.egg/radical/pilot/bootstrapper/default_bootstrapper.sh
2014:06:18 00:12:03 radical.pilot.PilotLauncherWorker-1: [DEBUG   ] Creating agent sandbox 'file://localhost/Users/oweidner/radical.pilot.sandbox/pilot-53a0bd320941de3b1c491fd7/'.
2014:06:18 00:12:03 radical.pilot.MainProcess: [INFO    ] ComputePilot '53a0bd320941de3b1c491fd7' state changed from 'PendingLaunch' to 'Launching'.
2014:06:18 00:12:03 radical.pilot.InputFileTransferWorker-1: [DEBUG   ] Found SSH context info: {'Type' : 'SSH', 'UserID' : 'oweidner', 'UserKey' : 'None', 'UserPass' : 'None'}.
2014:06:18 00:12:03 radical.pilot.InputFileTransferWorker-2: [DEBUG   ] Found SSH context info: {'Type' : 'SSH', 'UserID' : 'oweidner', 'UserKey' : 'None', 'UserPass' : 'None'}.
2014:06:18 00:12:03 radical.pilot.OutputFileTransferWorker-2: [DEBUG   ] Found SSH context info: {'Type' : 'SSH', 'UserID' : 'oweidner', 'UserKey' : 'None', 'UserPass' : 'None'}.
2014:06:18 00:12:03 radical.pilot.OutputFileTransferWorker-1: [DEBUG   ] Found SSH context info: {'Type' : 'SSH', 'UserID' : 'oweidner', 'UserKey' : 'None', 'UserPass' : 'None'}.
2014:06:18 00:12:03 radical.pilot.PilotLauncherWorker-1: [DEBUG   ] Copying bootstrapper 'file://localhost//private/tmp/RPP/lib/python2.7/site-packages/radical.pilot-0.15-py2.7.egg/radical/pilot/bootstrapper/default_bootstrapper.sh' to agent sandbox (file://localhost/Users/oweidner/radical.pilot.sandbox/pilot-53a0bd320941de3b1c491fd7/).
2014:06:18 00:12:04 radical.pilot.PilotLauncherWorker-1: [DEBUG   ] Copying agent 'file://localhost//private/tmp/RPP/lib/python2.7/site-packages/radical.pilot-0.15-py2.7.egg/radical/pilot/agent/radical-pilot-agent-singlecore.py' to agent sandbox (file://localhost/Users/oweidner/radical.pilot.sandbox/pilot-53a0bd320941de3b1c491fd7/).
2014:06:18 00:12:04 radical.pilot.MainProcess: [INFO    ] RUN ComputeUnit '53a0bd330941de3b1c491fd9' state changed from 'New' to 'PendingExecution'.
2014:06:18 00:12:04 radical.pilot.MainProcess: [INFO    ] RUN ComputeUnit '53a0bd330941de3b1c491fda' state changed from 'New' to 'PendingExecution'.
2014:06:18 00:12:04 radical.pilot.MainProcess: [INFO    ] RUN ComputeUnit '53a0bd330941de3b1c491fdb' state changed from 'New' to 'PendingExecution'.
2014:06:18 00:12:05 radical.pilot.MainProcess: [INFO    ] Scheduled ComputeUnits ['53a0bd330941de3b1c491fd9', '53a0bd330941de3b1c491fda', '53a0bd330941de3b1c491fdb', '53a0bd330941de3b1c491fdc', '53a0bd330941de3b1c491fdd', '53a0bd330941de3b1c491fde', '53a0bd330941de3b1c491fdf', '53a0bd330941de3b1c491fe0', '53a0bd330941de3b1c491fe1', '53a0bd330941de3b1c491fe2'] for execution on ComputePilot '53a0bd320941de3b1c491fd7'.
2014:06:18 00:12:05 radical.pilot.MainProcess: [INFO    ] RUN ComputeUnit '53a0bd330941de3b1c491fdc' state changed from 'New' to 'PendingExecution'.
2014:06:18 00:12:05 radical.pilot.MainProcess: [INFO    ] RUN ComputeUnit '53a0bd330941de3b1c491fdd' state changed from 'New' to 'PendingExecution'.
2014:06:18 00:12:05 radical.pilot.MainProcess: [INFO    ] RUN ComputeUnit '53a0bd330941de3b1c491fde' state changed from 'New' to 'PendingExecution'.
2014:06:18 00:12:05 radical.pilot.MainProcess: [INFO    ] RUN ComputeUnit '53a0bd330941de3b1c491fdf' state changed from 'New' to 'PendingExecution'.
2014:06:18 00:12:05 radical.pilot.MainProcess: [INFO    ] RUN ComputeUnit '53a0bd330941de3b1c491fe0' state changed from 'New' to 'PendingExecution'.
2014:06:18 00:12:05 radical.pilot.MainProcess: [INFO    ] RUN ComputeUnit '53a0bd330941de3b1c491fe1' state changed from 'New' to 'PendingExecution'.
2014:06:18 00:12:05 radical.pilot.MainProcess: [INFO    ] RUN ComputeUnit '53a0bd330941de3b1c491fe2' state changed from 'New' to 'PendingExecution'.
2014:06:18 00:12:05 radical.pilot.PilotLauncherWorker-1: [DEBUG   ] Bootstrap command line: /bin/bash ['-l', '-c', '"./default_bootstrapper.sh -r ec2-184-72-89-141.compute-1.amazonaws.com:27017/ -d radicalpilot -s 53a0bd2e0941de3b1c491fd5 -p 53a0bd320941de3b1c491fd7 -t 10 -c 1 -V 0.15  -l LOCAL  -C "']
2014:06:18 00:12:05 radical.pilot.PilotLauncherWorker-1: [DEBUG   ] Submitting SAGA job with description: <class 'saga.job.description.Description'> {'Executable': '/bin/bash', 'WorkingDirectory': '/Users/oweidner/radical.pilot.sandbox/pilot-53a0bd320941de3b1c491fd7', 'WallTimeLimit': 10, 'Arguments': ['-l', '-c', '"./default_bootstrapper.sh -r ec2-184-72-89-141.compute-1.amazonaws.com:27017/ -d radicalpilot -s 53a0bd2e0941de3b1c491fd5 -p 53a0bd320941de3b1c491fd7 -t 10 -c 1 -V 0.15  -l LOCAL  -C "'], 'Error': 'AGENT.STDERR', 'Output': 'AGENT.STDOUT', 'TotalCPUCount': 1}
2014:06:18 00:12:06 radical.pilot.PilotLauncherWorker-1: [DEBUG   ] SAGA job submitted with job id [fork://localhost]-[15251]
2014:06:18 00:12:06 radical.pilot.MainProcess: [DEBUG   ] Compute unit 53a0bd330941de3b1c491fd9 in state PendingExecution
2014:06:18 00:12:06 radical.pilot.MainProcess: [INFO    ] ComputePilot '53a0bd320941de3b1c491fd7' state changed from 'Launching' to 'PendingActive'.
2014:06:18 00:12:07 radical.pilot.MainProcess: [DEBUG   ] Compute unit 53a0bd330941de3b1c491fd9 in state PendingExecution
2014:06:18 00:12:08 radical.pilot.MainProcess: [DEBUG   ] Compute unit 53a0bd330941de3b1c491fd9 in state PendingExecution
2014:06:18 00:12:09 radical.pilot.MainProcess: [DEBUG   ] Compute unit 53a0bd330941de3b1c491fd9 in state PendingExecution
2014:06:18 00:12:10 radical.pilot.MainProcess: [DEBUG   ] Compute unit 53a0bd330941de3b1c491fd9 in state PendingExecution
2014:06:18 00:12:11 radical.pilot.MainProcess: [DEBUG   ] Compute unit 53a0bd330941de3b1c491fd9 in state PendingExecution
2014:06:18 00:12:12 radical.pilot.MainProcess: [DEBUG   ] Compute unit 53a0bd330941de3b1c491fd9 in state PendingExecution
2014:06:18 00:12:13 radical.pilot.MainProcess: [DEBUG   ] Compute unit 53a0bd330941de3b1c491fd9 in state PendingExecution
2014:06:18 00:12:14 radical.pilot.MainProcess: [DEBUG   ] Compute unit 53a0bd330941de3b1c491fd9 in state PendingExecution
2014:06:18 00:12:14 radical.pilot.MainProcess: [INFO    ] RUN ComputeUnit '53a0bd330941de3b1c491fd9' state changed from 'PendingExecution' to 'Executing'.
2014:06:18 00:12:14 radical.pilot.MainProcess: [INFO    ] ComputePilot '53a0bd320941de3b1c491fd7' state changed from 'PendingActive' to 'Active'.
2014:06:18 00:12:15 radical.pilot.MainProcess: [DEBUG   ] Compute unit 53a0bd330941de3b1c491fd9 in state Executing
2014:06:18 00:12:15 radical.pilot.MainProcess: [INFO    ] RUN ComputeUnit '53a0bd330941de3b1c491fda' state changed from 'PendingExecution' to 'Executing'.
2014:06:18 00:12:16 radical.pilot.MainProcess: [DEBUG   ] Compute unit 53a0bd330941de3b1c491fd9 in state Executing
2014:06:18 00:12:16 radical.pilot.MainProcess: [INFO    ] RUN ComputeUnit '53a0bd330941de3b1c491fdb' state changed from 'PendingExecution' to 'Executing'.
2014:06:18 00:12:17 radical.pilot.MainProcess: [DEBUG   ] Compute unit 53a0bd330941de3b1c491fd9 in state Executing
2014:06:18 00:12:17 radical.pilot.MainProcess: [INFO    ] RUN ComputeUnit '53a0bd330941de3b1c491fd9' state changed from 'Executing' to 'Done'.
2014:06:18 00:12:17 radical.pilot.MainProcess: [INFO    ] RUN ComputeUnit '53a0bd330941de3b1c491fdc' state changed from 'PendingExecution' to 'Executing'.
2014:06:18 00:12:18 radical.pilot.MainProcess: [DEBUG   ] Compute unit 53a0bd330941de3b1c491fd9 in state Done
---------------
A CU '53a0bd330941de3b1c491fd9' finished. Submitting B CU.
2014:06:18 00:12:18 radical.pilot.MainProcess: [INFO    ] Scheduled ComputeUnits ['53a0bd420941de3b1c491fe3'] for execution on ComputePilot '53a0bd320941de3b1c491fd7'.
---------------
B CU '9' '53a0bd420941de3b1c491fe3' submitted
2014:06:18 00:12:18 radical.pilot.MainProcess: [INFO    ] RUN ComputeUnit '53a0bd330941de3b1c491fdd' state changed from 'PendingExecution' to 'Executing'.
2014:06:18 00:12:18 radical.pilot.MainProcess: [INFO    ] RUN ComputeUnit '53a0bd420941de3b1c491fe3' state changed from 'New' to 'PendingExecution'.
2014:06:18 00:12:19 radical.pilot.MainProcess: [DEBUG   ] Compute unit 53a0bd330941de3b1c491fdb in state Executing
2014:06:18 00:12:20 radical.pilot.MainProcess: [INFO    ] RUN ComputeUnit '53a0bd330941de3b1c491fda' state changed from 'Executing' to 'Done'.
2014:06:18 00:12:20 radical.pilot.MainProcess: [INFO    ] RUN ComputeUnit '53a0bd330941de3b1c491fde' state changed from 'PendingExecution' to 'Executing'.
2014:06:18 00:12:20 radical.pilot.MainProcess: [DEBUG   ] Compute unit 53a0bd330941de3b1c491fdb in state Executing
2014:06:18 00:12:21 radical.pilot.MainProcess: [INFO    ] RUN ComputeUnit '53a0bd330941de3b1c491fdf' state changed from 'PendingExecution' to 'Executing'.
2014:06:18 00:12:21 radical.pilot.MainProcess: [DEBUG   ] Compute unit 53a0bd330941de3b1c491fdb in state Executing
2014:06:18 00:12:22 radical.pilot.MainProcess: [DEBUG   ] Compute unit 53a0bd330941de3b1c491fdb in state Executing
2014:06:18 00:12:23 radical.pilot.MainProcess: [INFO    ] RUN ComputeUnit '53a0bd330941de3b1c491fdb' state changed from 'Executing' to 'Done'.
2014:06:18 00:12:23 radical.pilot.MainProcess: [INFO    ] RUN ComputeUnit '53a0bd330941de3b1c491fe0' state changed from 'PendingExecution' to 'Executing'.
2014:06:18 00:12:23 radical.pilot.MainProcess: [DEBUG   ] Compute unit 53a0bd330941de3b1c491fdb in state Done
---------------
A CU '53a0bd330941de3b1c491fdb' finished. Submitting B CU.
2014:06:18 00:12:24 radical.pilot.MainProcess: [INFO    ] Scheduled ComputeUnits ['53a0bd470941de3b1c491fe4'] for execution on ComputePilot '53a0bd320941de3b1c491fd7'.
---------------
B CU '9' '53a0bd470941de3b1c491fe4' submitted
2014:06:18 00:12:24 radical.pilot.MainProcess: [INFO    ] RUN ComputeUnit '53a0bd330941de3b1c491fe1' state changed from 'PendingExecution' to 'Executing'.
2014:06:18 00:12:24 radical.pilot.MainProcess: [INFO    ] RUN ComputeUnit '53a0bd470941de3b1c491fe4' state changed from 'New' to 'PendingExecution'.
2014:06:18 00:12:25 radical.pilot.MainProcess: [DEBUG   ] Compute unit 53a0bd330941de3b1c491fdd in state Executing
2014:06:18 00:12:25 radical.pilot.MainProcess: [INFO    ] RUN ComputeUnit '53a0bd330941de3b1c491fe2' state changed from 'PendingExecution' to 'Executing'.
2014:06:18 00:12:26 radical.pilot.MainProcess: [DEBUG   ] Compute unit 53a0bd330941de3b1c491fdd in state Executing
2014:06:18 00:12:26 radical.pilot.MainProcess: [INFO    ] RUN ComputeUnit '53a0bd330941de3b1c491fdc' state changed from 'Executing' to 'Done'.
2014:06:18 00:12:26 radical.pilot.MainProcess: [INFO    ] RUN ComputeUnit '53a0bd420941de3b1c491fe3' state changed from 'PendingExecution' to 'Executing'.
2014:06:18 00:12:27 radical.pilot.MainProcess: [DEBUG   ] Compute unit 53a0bd330941de3b1c491fdd in state Executing
2014:06:18 00:12:27 radical.pilot.MainProcess: [INFO    ] RUN ComputeUnit '53a0bd470941de3b1c491fe4' state changed from 'PendingExecution' to 'Executing'.
2014:06:18 00:12:28 radical.pilot.MainProcess: [DEBUG   ] Compute unit 53a0bd330941de3b1c491fdd in state Executing
2014:06:18 00:12:28 radical.pilot.MainProcess: [INFO    ] RUN ComputeUnit '53a0bd330941de3b1c491fdd' state changed from 'Executing' to 'Done'.
2014:06:18 00:12:29 radical.pilot.MainProcess: [DEBUG   ] Compute unit 53a0bd330941de3b1c491fdd in state Done
---------------
A CU '53a0bd330941de3b1c491fdd' finished. Submitting B CU.
2014:06:18 00:12:30 radical.pilot.MainProcess: [INFO    ] Scheduled ComputeUnits ['53a0bd4d0941de3b1c491fe5'] for execution on ComputePilot '53a0bd320941de3b1c491fd7'.
---------------
B CU '9' '53a0bd4d0941de3b1c491fe5' submitted
2014:06:18 00:12:30 radical.pilot.MainProcess: [INFO    ] RUN ComputeUnit '53a0bd4d0941de3b1c491fe5' state changed from 'New' to 'PendingExecution'.
2014:06:18 00:12:31 radical.pilot.MainProcess: [DEBUG   ] Compute unit 53a0bd330941de3b1c491fdf in state Executing
2014:06:18 00:12:31 radical.pilot.MainProcess: [INFO    ] RUN ComputeUnit '53a0bd4d0941de3b1c491fe5' state changed from 'PendingExecution' to 'Executing'.
2014:06:18 00:12:32 radical.pilot.MainProcess: [DEBUG   ] Compute unit 53a0bd330941de3b1c491fdf in state Executing
2014:06:18 00:12:32 radical.pilot.MainProcess: [INFO    ] RUN ComputeUnit '53a0bd330941de3b1c491fde' state changed from 'Executing' to 'Done'.
2014:06:18 00:12:33 radical.pilot.MainProcess: [DEBUG   ] Compute unit 53a0bd330941de3b1c491fdf in state Executing
2014:06:18 00:12:34 radical.pilot.MainProcess: [DEBUG   ] Compute unit 53a0bd330941de3b1c491fdf in state Executing
2014:06:18 00:12:34 radical.pilot.MainProcess: [INFO    ] RUN ComputeUnit '53a0bd330941de3b1c491fdf' state changed from 'Executing' to 'Done'.
2014:06:18 00:12:35 radical.pilot.MainProcess: [DEBUG   ] Compute unit 53a0bd330941de3b1c491fdf in state Done
---------------
A CU '53a0bd330941de3b1c491fdf' finished. Submitting B CU.
2014:06:18 00:12:35 radical.pilot.MainProcess: [INFO    ] Scheduled ComputeUnits ['53a0bd530941de3b1c491fe6'] for execution on ComputePilot '53a0bd320941de3b1c491fd7'.
---------------
B CU '9' '53a0bd530941de3b1c491fe6' submitted
2014:06:18 00:12:36 radical.pilot.MainProcess: [DEBUG   ] Compute unit 53a0bd330941de3b1c491fe1 in state Executing
2014:06:18 00:12:36 radical.pilot.MainProcess: [INFO    ] RUN ComputeUnit '53a0bd530941de3b1c491fe6' state changed from 'New' to 'Executing'.
2014:06:18 00:12:37 radical.pilot.MainProcess: [DEBUG   ] Compute unit 53a0bd330941de3b1c491fe1 in state Executing
2014:06:18 00:12:37 radical.pilot.MainProcess: [INFO    ] RUN ComputeUnit '53a0bd330941de3b1c491fe0' state changed from 'Executing' to 'Done'.
2014:06:18 00:12:38 radical.pilot.MainProcess: [DEBUG   ] Compute unit 53a0bd330941de3b1c491fe1 in state Executing
2014:06:18 00:12:39 radical.pilot.MainProcess: [DEBUG   ] Compute unit 53a0bd330941de3b1c491fe1 in state Executing
2014:06:18 00:12:40 radical.pilot.MainProcess: [DEBUG   ] Compute unit 53a0bd330941de3b1c491fe1 in state Executing
2014:06:18 00:12:41 radical.pilot.MainProcess: [INFO    ] RUN ComputeUnit '53a0bd330941de3b1c491fe1' state changed from 'Executing' to 'Done'.
2014:06:18 00:12:41 radical.pilot.MainProcess: [DEBUG   ] Compute unit 53a0bd330941de3b1c491fe1 in state Done
---------------
A CU '53a0bd330941de3b1c491fe1' finished. Submitting B CU.
2014:06:18 00:12:42 radical.pilot.MainProcess: [INFO    ] Scheduled ComputeUnits ['53a0bd5a0941de3b1c491fe7'] for execution on ComputePilot '53a0bd320941de3b1c491fd7'.
---------------
B CU '9' '53a0bd5a0941de3b1c491fe7' submitted
---------------
A CU '53a0bd330941de3b1c491fda' finished. Submitting B CU.
2014:06:18 00:12:43 radical.pilot.MainProcess: [INFO    ] Scheduled ComputeUnits ['53a0bd5a0941de3b1c491fe8'] for execution on ComputePilot '53a0bd320941de3b1c491fd7'.
---------------
B CU '9' '53a0bd5a0941de3b1c491fe8' submitted
---------------
A CU '53a0bd330941de3b1c491fde' finished. Submitting B CU.
2014:06:18 00:12:43 radical.pilot.MainProcess: [INFO    ] RUN ComputeUnit '53a0bd330941de3b1c491fe2' state changed from 'Executing' to 'Done'.
2014:06:18 00:12:43 radical.pilot.MainProcess: [INFO    ] RUN ComputeUnit '53a0bd5a0941de3b1c491fe7' state changed from 'New' to 'Executing'.
2014:06:18 00:12:43 radical.pilot.MainProcess: [INFO    ] RUN ComputeUnit '53a0bd5a0941de3b1c491fe8' state changed from 'New' to 'PendingExecution'.
2014:06:18 00:12:43 radical.pilot.MainProcess: [INFO    ] Scheduled ComputeUnits ['53a0bd5b0941de3b1c491fe9'] for execution on ComputePilot '53a0bd320941de3b1c491fd7'.
---------------
B CU '9' '53a0bd5b0941de3b1c491fe9' submitted
---------------
A CU '53a0bd330941de3b1c491fe2' finished. Submitting B CU.
2014:06:18 00:12:44 radical.pilot.MainProcess: [INFO    ] Scheduled ComputeUnits ['53a0bd5c0941de3b1c491fea'] for execution on ComputePilot '53a0bd320941de3b1c491fd7'.
---------------
B CU '9' '53a0bd5c0941de3b1c491fea' submitted
---------------
A CU '53a0bd330941de3b1c491fdc' finished. Submitting B CU.
2014:06:18 00:12:44 radical.pilot.MainProcess: [INFO    ] RUN ComputeUnit '53a0bd5a0941de3b1c491fe8' state changed from 'PendingExecution' to 'Executing'.
2014:06:18 00:12:44 radical.pilot.MainProcess: [INFO    ] RUN ComputeUnit '53a0bd5b0941de3b1c491fe9' state changed from 'New' to 'PendingExecution'.
2014:06:18 00:12:44 radical.pilot.MainProcess: [INFO    ] RUN ComputeUnit '53a0bd5c0941de3b1c491fea' state changed from 'New' to 'PendingExecution'.
2014:06:18 00:12:45 radical.pilot.MainProcess: [INFO    ] Scheduled ComputeUnits ['53a0bd5c0941de3b1c491feb'] for execution on ComputePilot '53a0bd320941de3b1c491fd7'.
---------------
B CU '9' '53a0bd5c0941de3b1c491feb' submitted
---------------
A CU '53a0bd330941de3b1c491fe0' finished. Submitting B CU.
2014:06:18 00:12:45 radical.pilot.MainProcess: [INFO    ] Scheduled ComputeUnits ['53a0bd5d0941de3b1c491fec'] for execution on ComputePilot '53a0bd320941de3b1c491fd7'.
---------------
B CU '9' '53a0bd5d0941de3b1c491fec' submitted
2014:06:18 00:12:45 radical.pilot.MainProcess: [INFO    ] RUN ComputeUnit '53a0bd5c0941de3b1c491feb' state changed from 'New' to 'PendingExecution'.
2014:06:18 00:12:46 radical.pilot.MainProcess: [DEBUG   ] Compute unit 53a0bd420941de3b1c491fe3 in state Executing
2014:06:18 00:12:47 radical.pilot.MainProcess: [INFO    ] RUN ComputeUnit '53a0bd5b0941de3b1c491fe9' state changed from 'PendingExecution' to 'Executing'.
2014:06:18 00:12:47 radical.pilot.MainProcess: [INFO    ] RUN ComputeUnit '53a0bd5d0941de3b1c491fec' state changed from 'New' to 'PendingExecution'.
2014:06:18 00:12:47 radical.pilot.MainProcess: [INFO    ] RUN ComputeUnit '53a0bd420941de3b1c491fe3' state changed from 'Executing' to 'Failed'.
[Callback]: CU '53a0bd420941de3b1c491fe3' state changed to 'Failed'.
            Log: 
[u'Scheduled for execution on ComputePilot 53a0bd320941de3b1c491fd7.']
2014:06:18 00:12:47 radical.pilot.MainProcess: [DEBUG   ] Compute unit 53a0bd420941de3b1c491fe3 in state Failed
---------------
B CU '53a0bd420941de3b1c491fe3' finished.
oleweidner commented 10 years ago

HOWEVER: the CUs fail with following error:

/bin/sh: -c: line 0: ` cd /Users/oweidner/radical.pilot.sandbox/pilot-53a0bd320941de3b1c491fd7/unit-53a0bd530941de3b1c491fe6 &&  /bin/echo  $CU_SET CU with id $CU_NO (`cat $HOME/tmp/A-03.txt`) '
/bin/sh: -c: line 0: syntax error near unexpected token `('
marksantcroos commented 10 years ago

This is probably because the fixes for proper escaping are only in the multicore agent. I’ll make the multicore agent also work with localhost and then we are back to one agent.

On 17 Jun 2014, at 18:16 , Ole Weidner notifications@github.com wrote:

HOWEVER: the CUs fail with following error:

/bin/sh: -c: line 0: cd /Users/oweidner/radical.pilot.sandbox/pilot-53a0bd320941de3b1c491fd7/unit-53a0bd530941de3b1c491fe6 && /bin/echo $CU_SET CU with id $CU_NO (cat $HOME/tmp/A-03.txt) ' /bin/sh: -c: line 0: syntax error near unexpected token('

— Reply to this email directly or view it on GitHub.

oleweidner commented 10 years ago

+1

On Jun 18, 2014, at 12:22 AM, Mark Santcroos notifications@github.com wrote:

This is probably because the fixes for proper escaping are only in the multicore agent. I’ll make the multicore agent also work with localhost and then we are back to one agent.

On 17 Jun 2014, at 18:16 , Ole Weidner notifications@github.com wrote:

HOWEVER: the CUs fail with following error:

/bin/sh: -c: line 0: cd /Users/oweidner/radical.pilot.sandbox/pilot-53a0bd320941de3b1c491fd7/unit-53a0bd530941de3b1c491fe6 && /bin/echo $CU_SET CU with id $CU_NO (cat $HOME/tmp/A-03.txt) ' /bin/sh: -c: line 0: syntax error near unexpected token('

— Reply to this email directly or view it on GitHub.

— Reply to this email directly or view it on GitHub.

marksantcroos commented 10 years ago

I believe this is fixed.