radical-cybertools / radical.pilot

RADICAL-Pilot
http://radical-cybertools.github.io/radical-pilot/index.html
Other
54 stars 23 forks source link

Radical Pilot on Yellowstone #145

Closed dinesh-ganapati closed 10 years ago

dinesh-ganapati commented 10 years ago

I am trying to run a simple radical pilot script on yellowstone. I am running an instance of mongodb on one of the login nodes. (the compute nodes are able to talk to mongodb) and I also see the compute units being logged. but for some reason when i do a bjobs i do not see the jobs submitted, and the , only one compute unit is launched and changed to pending active.(I am running with RADICAL_PILOT_VERBOSE=info and do not see an error ). Here is what I see in the info mode:

2014:05:26 08:50:29 radical.pilot.MainProcess: [INFO ] radical.pilot version: 0.13 2014:05:26 08:50:30 radical.pilot.MainProcess: [INFO ] New Session created{'database_url': 'mongodb://yslogin1:27017', 'database_name': 'radicalpilot', 'last_reconnect': None, 'uid': '538354b67749a65ebfa886a2', 'created': datetime.datetime(2014, 5, 26, 14, 50, 30, 8049)}. 2014:05:26 08:50:30 radical.pilot.MainProcess: [INFO ] Loaded scheduler: DirectSubmissionScheduler. 2014:05:26 08:50:30 radical.pilot.MainProcess: [INFO ] Scheduled ComputeUnits ['538354b67749a65ebfa886a6', '538354b67749a65ebfa886a7', '538354b67749a65ebfa886a8', '538354b67749a65ebfa886a9', '538354b67749a65ebfa886aa', '538354b67749a65ebfa886ab', '538354b67749a65ebfa886ac', '538354b67749a65ebfa886ad', '538354b67749a65ebfa886ae', '538354b67749a65ebfa886af', '538354b67749a65ebfa886b0', '538354b67749a65ebfa886b1', '538354b67749a65ebfa886b2', '538354b67749a65ebfa886b3', '538354b67749a65ebfa886b4', '538354b67749a65ebfa886b5'] for execution on ComputePilot '538354b67749a65ebfa886a4'. 2014:05:26 08:50:31 radical.pilot.PilotLauncherWorker-1: [INFO ] Launching ComputePilot {u'state': u'PendingLaunch', u'description': {u'project': u'URTG0003', u'resource': u'yellowstone.ucar.edu-local', u'queue': None, u'sandbox': u'/glade/scratch/dinesh/workdir/', u'cleanup': True, u'pilot_agent_priv': None, u'cores': 16, u'runtime': 10}, u'sagajobid': None, u'started': None, u'cores_per_node': None, u'output_transfer_started': None, u'finished': None, u'submitted': datetime.datetime(2014, 5, 26, 14, 50, 30, 885000), u'output_transfer_finished': None, u'sandbox': u'file://localhost/glade/scratch/dinesh/workdir/pilot-538354b67749a65ebfa886a4/', u'pilotmanager': u'538354b67749a65ebfa886a3', u'unitmanager': u'538354b67749a65ebfa886a5', u'statehistory': [{u'timestamp': datetime.datetime(2014, 5, 26, 14, 50, 30, 884000), u'state': u'PendingLaunch'}], u'wu_queue': [ObjectId('538354b67749a65ebfa886a6'), ObjectId('538354b67749a65ebfa886a7'), ObjectId('538354b67749a65ebfa886a8'), ObjectId('538354b67749a65ebfa886a9'), ObjectId('538354b67749a65ebfa886aa'), ObjectId('538354b67749a65ebfa886ab'), ObjectId('538354b67749a65ebfa886ac'), ObjectId('538354b67749a65ebfa886ad'), ObjectId('538354b67749a65ebfa886ae'), ObjectId('538354b67749a65ebfa886af'), ObjectId('538354b67749a65ebfa886b0'), ObjectId('538354b67749a65ebfa886b1'), ObjectId('538354b67749a65ebfa886b2'), ObjectId('538354b67749a65ebfa886b3'), ObjectId('538354b67749a65ebfa886b4'), ObjectId('538354b67749a65ebfa886b5')], u'command': None, u'input_transfer_started': None, u'_id': ObjectId('538354b67749a65ebfa886a4'), u'input_transfer_finished': None, u'nodes': None, u'log': []} 2014:05:26 08:50:31 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '538354b67749a65ebfa886a6' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '538354b67749a65ebfa886a6' state changed to PendingExecution. 2014:05:26 08:50:31 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '538354b67749a65ebfa886a7' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '538354b67749a65ebfa886a7' state changed to PendingExecution. 2014:05:26 08:50:31 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '538354b67749a65ebfa886a8' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '538354b67749a65ebfa886a8' state changed to PendingExecution. 2014:05:26 08:50:31 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '538354b67749a65ebfa886a9' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '538354b67749a65ebfa886a9' state changed to PendingExecution. 2014:05:26 08:50:31 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '538354b67749a65ebfa886aa' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '538354b67749a65ebfa886aa' state changed to PendingExecution. 2014:05:26 08:50:31 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '538354b67749a65ebfa886ab' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '538354b67749a65ebfa886ab' state changed to PendingExecution. 2014:05:26 08:50:31 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '538354b67749a65ebfa886ac' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '538354b67749a65ebfa886ac' state changed to PendingExecution. 2014:05:26 08:50:31 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '538354b67749a65ebfa886ad' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '538354b67749a65ebfa886ad' state changed to PendingExecution. 2014:05:26 08:50:31 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '538354b67749a65ebfa886ae' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '538354b67749a65ebfa886ae' state changed to PendingExecution. 2014:05:26 08:50:31 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '538354b67749a65ebfa886af' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '538354b67749a65ebfa886af' state changed to PendingExecution. 2014:05:26 08:50:31 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '538354b67749a65ebfa886b0' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '538354b67749a65ebfa886b0' state changed to PendingExecution. 2014:05:26 08:50:31 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '538354b67749a65ebfa886b1' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '538354b67749a65ebfa886b1' state changed to PendingExecution. 2014:05:26 08:50:31 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '538354b67749a65ebfa886b2' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '538354b67749a65ebfa886b2' state changed to PendingExecution. 2014:05:26 08:50:31 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '538354b67749a65ebfa886b3' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '538354b67749a65ebfa886b3' state changed to PendingExecution. 2014:05:26 08:50:31 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '538354b67749a65ebfa886b4' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '538354b67749a65ebfa886b4' state changed to PendingExecution. 2014:05:26 08:50:31 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '538354b67749a65ebfa886b5' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '538354b67749a65ebfa886b5' state changed to PendingExecution. 2014:05:26 08:50:32 radical.pilot.MainProcess: [INFO ] ComputePilot '538354b67749a65ebfa886a4' state changed from 'PendingLaunch' to 'Launching'. [Callback]: ComputePilot '538354b67749a65ebfa886a4' state changed to Launching. 2014:05:26 08:50:38 radical.pilot.MainProcess: [INFO ] ComputePilot '538354b67749a65ebfa886a4' state changed from 'Launching' to 'PendingActive'. [Callback]: ComputePilot '538354b67749a65ebfa886a4' state changed to PendingActive.

And the Mongodb : 2014-05-26T08:46:53.579-0600 [initandlisten] MongoDB starting : pid=23088 port=27017 dbpath=data/db 64-bit host=yslogin1 2014-05-26T08:46:53.580-0600 [initandlisten] 2014-05-26T08:46:53.580-0600 [initandlisten] * WARNING: You are running on a NUMA machine. 2014-05-26T08:46:53.580-0600 [initandlisten] * We suggest launching mongod like this to avoid performance problems: 2014-05-26T08:46:53.580-0600 [initandlisten] \ numactl --interleave=all mongod [other options] 2014-05-26T08:46:53.580-0600 [initandlisten] 2014-05-26T08:46:53.580-0600 [initandlisten] db version v2.6.1 2014-05-26T08:46:53.580-0600 [initandlisten] git version: 4b95b086d2374bdcfcdf2249272fb552c9c726e8 2014-05-26T08:46:53.580-0600 [initandlisten] build info: Linux build14.nj1.10gen.cc 2.6.32-431.3.1.el6.x86_64 #1 SMP Fri Jan 3 21:39:27 UTC 2014 x86_64 BOOST_LIB_VERSION=1_49 2014-05-26T08:46:53.580-0600 [initandlisten] allocator: tcmalloc 2014-05-26T08:46:53.580-0600 [initandlisten] options: { storage: { dbPath: "data/db" } } 2014-05-26T08:46:53.669-0600 [initandlisten] journal dir=data/db/journal 2014-05-26T08:46:53.669-0600 [initandlisten] recover : no journal files present, no recovery needed 2014-05-26T08:46:54.167-0600 [initandlisten] waiting for connections on port 27017 2014-05-26T08:47:40.836-0600 [initandlisten] connection accepted from 10.18.0.22:60393 #1 (1 connection now open) 2014-05-26T08:47:40.839-0600 [FileAllocator] allocating new datafile data/db/radicalpilot.ns, filling with zeroes... 2014-05-26T08:47:40.839-0600 [FileAllocator] creating directory data/db/_tmp 2014-05-26T08:47:40.862-0600 [FileAllocator] done allocating datafile data/db/radicalpilot.ns, size: 16MB, took 0.012 secs 2014-05-26T08:47:40.891-0600 [FileAllocator] allocating new datafile data/db/radicalpilot.0, filling with zeroes... 2014-05-26T08:47:40.919-0600 [FileAllocator] done allocating datafile data/db/radicalpilot.0, size: 64MB, took 0.027 secs 2014-05-26T08:47:40.923-0600 [conn1] build index on: radicalpilot.5383540c7749a6593a2470bd properties: { v: 1, key: { _id: 1 }, name: "id", ns: "radicalpilot.5383540c7749a6593a2470bd" } 2014-05-26T08:47:40.923-0600 [conn1] added index to empty collection 2014-05-26T08:47:41.245-0600 [conn1] build index on: radicalpilot.5383540c7749a6593a2470bd.pm properties: { v: 1, key: { _id: 1 }, name: "id", ns: "radicalpilot.5383540c7749a6593a2470bd.pm" } 2014-05-26T08:47:41.245-0600 [conn1] added index to empty collection 2014-05-26T08:47:44.016-0600 [conn1] build index on: radicalpilot.5383540c7749a6593a2470bd.p properties: { v: 1, key: { _id: 1 }, name: "id", ns: "radicalpilot.5383540c7749a6593a2470bd.p" } 2014-05-26T08:47:44.016-0600 [conn1] added index to empty collection 2014-05-26T08:47:44.019-0600 [initandlisten] connection accepted from 10.18.0.22:60395 #2 (2 connections now open) 2014-05-26T08:47:44.061-0600 [conn1] build index on: radicalpilot.5383540c7749a6593a2470bd.wm properties: { v: 1, key: { _id: 1 }, name: "id", ns: "radicalpilot.5383540c7749a6593a2470bd.wm" } 2014-05-26T08:47:44.061-0600 [conn1] added index to empty collection 2014-05-26T08:47:44.073-0600 [initandlisten] connection accepted from 10.18.0.22:60396 #3 (3 connections now open) 2014-05-26T08:47:44.098-0600 [conn3] build index on: radicalpilot.5383540c7749a6593a2470bd.w properties: { v: 1, key: { _id: 1 }, name: "id", ns: "radicalpilot.5383540c7749a6593a2470bd.w" } 2014-05-26T08:47:44.098-0600 [conn3] added index to empty collection 2014-05-26T08:47:44.110-0600 [initandlisten] connection accepted from 10.18.0.22:60397 #4 (4 connections now open) 2014-05-26T08:47:44.112-0600 [initandlisten] connection accepted from 10.18.0.22:60398 #5 (5 connections now open) 2014-05-26T08:47:44.113-0600 [initandlisten] connection accepted from 10.18.0.22:60399 #6 (6 connections now open) 2014-05-26T08:47:44.113-0600 [initandlisten] connection accepted from 10.18.0.22:60400 #7 (7 connections now open) 2014-05-26T08:47:54.146-0600 [clientcursormon] mem (MB) res:38 virt:546 2014-05-26T08:47:54.146-0600 [clientcursormon] mapped (incl journal view):320 2014-05-26T08:47:54.147-0600 [clientcursormon] connections:7 2014-05-26T08:50:15.124-0600 [conn2] end connection 10.18.0.22:60395 (6 connections now open) 2014-05-26T08:50:15.124-0600 [conn3] end connection 10.18.0.22:60396 (6 connections now open) 2014-05-26T08:50:15.124-0600 [conn6] end connection 10.18.0.22:60399 (4 connections now open) 2014-05-26T08:50:15.124-0600 [conn5] end connection 10.18.0.22:60398 (3 connections now open) 2014-05-26T08:50:15.124-0600 [conn7] end connection 10.18.0.22:60400 (3 connections now open) 2014-05-26T08:50:15.124-0600 [conn4] end connection 10.18.0.22:60397 (3 connections now open) 2014-05-26T08:50:15.124-0600 [conn1] end connection 10.18.0.22:60393 (3 connections now open) 2014-05-26T08:50:30.012-0600 [initandlisten] connection accepted from 10.18.0.22:60409 #8 (1 connection now open) 2014-05-26T08:50:30.038-0600 [conn8] build index on: radicalpilot.538354b67749a65ebfa886a2 properties: { v: 1, key: { _id: 1 }, name: "id", ns: "radicalpilot.538354b67749a65ebfa886a2" } 2014-05-26T08:50:30.038-0600 [conn8] added index to empty collection 2014-05-26T08:50:30.075-0600 [conn8] build index on: radicalpilot.538354b67749a65ebfa886a2.pm properties: { v: 1, key: { _id: 1 }, name: "id", ns: "radicalpilot.538354b67749a65ebfa886a2.pm" } 2014-05-26T08:50:30.075-0600 [conn8] added index to empty collection 2014-05-26T08:50:30.869-0600 [initandlisten] connection accepted from 10.18.0.22:60410 #9 (2 connections now open) 2014-05-26T08:50:30.886-0600 [conn8] build index on: radicalpilot.538354b67749a65ebfa886a2.p properties: { v: 1, key: { _id: 1 }, name: "id", ns: "radicalpilot.538354b67749a65ebfa886a2.p" } 2014-05-26T08:50:30.886-0600 [conn8] added index to empty collection 2014-05-26T08:50:30.914-0600 [conn8] build index on: radicalpilot.538354b67749a65ebfa886a2.wm properties: { v: 1, key: { _id: 1 }, name: "id", ns: "radicalpilot.538354b67749a65ebfa886a2.wm" } 2014-05-26T08:50:30.914-0600 [conn8] added index to empty collection 2014-05-26T08:50:30.926-0600 [initandlisten] connection accepted from 10.18.0.22:60411 #10 (3 connections now open) 2014-05-26T08:50:30.938-0600 [conn10] build index on: radicalpilot.538354b67749a65ebfa886a2.w properties: { v: 1, key: { _id: 1 }, name: "id", ns: "radicalpilot.538354b67749a65ebfa886a2.w" }

Thanks , Dinesh

marksantcroos commented 10 years ago

On 26 May 2014, at 18:16 , dinesh-ganapati notifications@github.com wrote:

I am trying to run a simple radical pilot script on yellowstone. I am running an instance of mongodb on one of the login nodes. (the compute nodes are able to talk to mongodb) and I also see the compute units being logged. but for some reason when i do a bjobs i do not see the jobs submitted, and the , only one compute unit is launched and changed to pending active.(I am running with RADICAL_PILOT_VERBOSE=info and do not see an error ).

Indeed, the pilot didn’t become active yet (I assume you meant pilot and not compute unit).

If you are sure that it was not queued (anymore), then something went wrong indeed. You might want to have a look $HOME/radical.pilot.sandbox/pilot-XXXX and see if there are errors in the AGENT.LOG or stderr/stdout.

dinesh-ganapati commented 10 years ago

Hi Mark, Sorry I forgot to add. The contents of AGENT.STDERR were:

Lmod Warning: Did not find: curl (This was the same error recorded when I was trying to use 'mongodb:// ec2-184-72-89-141.compute-1.amazonaws.com:27017/ instead of the local instance of the mongodb)

The AGENT.STDOUT also documents the 'fail to do a module load curl' :

################################################################################

Running pre-bootstrapping command

CMDLINE: module purge

################################################################################

Running pre-bootstrapping command

CMDLINE: module load python/2.7.5

################################################################################

Running pre-bootstrapping command

CMDLINE: module load curl

Error running pre-boostrapping command! ABORTING


Sender: LSF System lsfadmin@ys3228-ib Subject: Job 124168: < #!/bin/bash ;#BSUB -cwd /glade/scratch/dinesh/workdir/pilot-538354b67749a65ebfa886a4 ;#BSUB -o /glade/scratch/dinesh/workdir/pilot-538354b67749a65ebfa886a4/AGENT.STDOUT ;#BSUB -e /glade/scratch/dinesh/workdir/pilot-538354b67749a65ebfa886a4/AGENT.STDERR ;#BSUB -W 0:10 ;#BSUB -q premium ;#BSUB -P URTG0003 ;#BSUB -n 16 ; /bin/bash -l -c "./default_bootstrapper.sh -r yslogin1:27017 -d radicalpilot -s 538354b67749a65ebfa886a2 -p 538354b67749a65ebfa886a4 -t 10 -c 16 -V 0.13 -l SSH -i /usr/bin/python -e 'module purge' -e 'module load python/2.7.5' -e 'module load curl' -C -a URTG0003 " > in cluster

Exited Job < #!/bin/bash ;#BSUB -cwd /glade/scratch/dinesh/workdir/pilot-538354b67749a65ebfa886a4 ;#BSUB -o /glade/scratch/dinesh/workdir/pilot-538354b67749a65ebfa886a4/AGENT.STDOUT ;#BSUB -e /glade/scratch/dinesh/workdir/pilot-538354b67749a65ebfa886a4/AGENT.STDERR ;#BSUB -W 0:10 ;#BSUB -q premium ;#BSUB -P URTG0003 ;#BSUB -n 16 ; /bin/bash -l -c "./default_bootstrapper.sh -r yslogin1:27017 -d radicalpilot -s 538354b67749a65ebfa886a2 -p 538354b67749a65ebfa886a4 -t 10 -c 16 -V 0.13 -l SSH -i /usr/bin/python -e 'module purge' -e 'module load python/2.7.5' -e 'module load curl' -C -a URTG0003 " > was submitted from host by user in cluster . Job was executed on host(s) <16*ys3228-ib>, in queue , as user in cluster . was used as the home directory. was used as the working directory. Started at Mon May 26 08:50:42 2014 Results reported at Mon May 26 08:50:44 2014 Your job looked like: --- # LSBATCH: User input #!/bin/bash #BSUB -cwd /glade/scratch/dinesh/workdir/pilot-538354b67749a65ebfa886a4 #BSUB -o /glade/scratch/dinesh/workdir/pilot-538354b67749a65ebfa886a4/AGENT.STDOUT #BSUB -e /glade/scratch/dinesh/workdir/pilot-538354b67749a65ebfa886a4/AGENT.STDERR #BSUB -W 0:10 #BSUB -q premium #BSUB -P URTG0003 #BSUB -n 16 /bin/bash -l -c "./default_bootstrapper.sh -r yslogin1:27017 -d radicalpilot -s 538354b67749a65ebfa886a2 -p 538354b67749a65ebfa886a4 -t 10 -c 16 -V 0.13 -l SSH -i /usr/bin/python -e 'module purge' -e 'module load python/2.7.5' -e 'module load curl' -C -a URTG0003 " --- Exited with exit code 1. On Mon, May 26, 2014 at 2:58 PM, Mark Santcroos notifications@github.comwrote: > On 26 May 2014, at 18:16 , dinesh-ganapati notifications@github.com > wrote: > > > I am trying to run a simple radical pilot script on yellowstone. I am > > running an instance of mongodb on one of the login nodes. (the compute > > nodes are able to talk to mongodb) and I also see the compute units being > > logged. but for some reason when i do a bjobs i do not see the jobs > > submitted, and the , only one compute unit is launched and changed to > > pending active.(I am running with RADICAL_PILOT_VERBOSE=info and do not see > > an error ). > > Indeed, the pilot didn’t become active yet (I assume you meant pilot and > not compute unit). > > If you are sure that it was not queued (anymore), then something went > wrong indeed. > You might want to have a look $HOME/radical.pilot.sandbox/pilot-XXXX and > see if there are errors in the AGENT.LOG or stderr/stdout. > > — > Reply to this email directly or view it on GitHubhttps://github.com/radical-cybertools/radical.pilot/issues/145#issuecomment-44211050 > .
marksantcroos commented 10 years ago

Ok, so there is a bootstrapping issue.

What does your (minimal) script look like?

Did you construct a resource configuration? (As there is no default resource config for yellowstone)

On 26 May 2014, at 22:12 , dinesh-ganapati notifications@github.com wrote:

Hi Mark, Sorry I forgot to add. The contents of AGENT.STDERR were:

Lmod Warning: Did not find: curl (This was the same error recorded when I was trying to use 'mongodb:// ec2-184-72-89-141.compute-1.amazonaws.com:27017/ instead of the local instance of the mongodb)

The AGENT.STDOUT also documents the 'fail to do a module load curl' :

################################################################################

Running pre-bootstrapping command

CMDLINE: module purge

################################################################################

Running pre-bootstrapping command

CMDLINE: module load python/2.7.5

################################################################################

Running pre-bootstrapping command

CMDLINE: module load curl

Error running pre-boostrapping command! ABORTING


Sender: LSF System lsfadmin@ys3228-ib Subject: Job 124168: < #!/bin/bash ;#BSUB -cwd /glade/scratch/dinesh/workdir/pilot-538354b67749a65ebfa886a4 ;#BSUB -o /glade/scratch/dinesh/workdir/pilot-538354b67749a65ebfa886a4/AGENT.STDOUT ;#BSUB -e /glade/scratch/dinesh/workdir/pilot-538354b67749a65ebfa886a4/AGENT.STDERR ;#BSUB -W 0:10 ;#BSUB -q premium ;#BSUB -P URTG0003 ;#BSUB -n 16 ; /bin/bash -l -c "./default_bootstrapper.sh -r yslogin1:27017 -d radicalpilot -s 538354b67749a65ebfa886a2 -p 538354b67749a65ebfa886a4 -t 10 -c 16 -V 0.13 -l SSH -i /usr/bin/python -e 'module purge' -e 'module load python/2.7.5' -e 'module load curl' -C -a URTG0003 " > in cluster

Exited Job < #!/bin/bash ;#BSUB -cwd /glade/scratch/dinesh/workdir/pilot-538354b67749a65ebfa886a4 ;#BSUB -o /glade/scratch/dinesh/workdir/pilot-538354b67749a65ebfa886a4/AGENT.STDOUT ;#BSUB -e /glade/scratch/dinesh/workdir/pilot-538354b67749a65ebfa886a4/AGENT.STDERR ;#BSUB -W 0:10 ;#BSUB -q premium ;#BSUB -P URTG0003 ;#BSUB -n 16 ; /bin/bash -l -c "./default_bootstrapper.sh -r yslogin1:27017 -d radicalpilot -s 538354b67749a65ebfa886a2 -p 538354b67749a65ebfa886a4 -t 10 -c 16 -V 0.13 -l SSH -i /usr/bin/python -e 'module purge' -e 'module load python/2.7.5' -e 'module load curl' -C -a URTG0003 " > was submitted from host by user in cluster . Job was executed on host(s) <16*ys3228-ib>, in queue , as user in cluster . was used as the home directory. was used as the working directory. Started at Mon May 26 08:50:42 2014 Results reported at Mon May 26 08:50:44 2014 Your job looked like: --- # LSBATCH: User input #!/bin/bash #BSUB -cwd /glade/scratch/dinesh/workdir/pilot-538354b67749a65ebfa886a4 #BSUB -o /glade/scratch/dinesh/workdir/pilot-538354b67749a65ebfa886a4/AGENT.STDOUT #BSUB -e /glade/scratch/dinesh/workdir/pilot-538354b67749a65ebfa886a4/AGENT.STDERR #BSUB -W 0:10 #BSUB -q premium #BSUB -P URTG0003 #BSUB -n 16 /bin/bash -l -c "./default_bootstrapper.sh -r yslogin1:27017 -d radicalpilot -s 538354b67749a65ebfa886a2 -p 538354b67749a65ebfa886a4 -t 10 -c 16 -V 0.13 -l SSH -i /usr/bin/python -e 'module purge' -e 'module load python/2.7.5' -e 'module load curl' -C -a URTG0003 " --- Exited with exit code 1. On Mon, May 26, 2014 at 2:58 PM, Mark Santcroos notifications@github.comwrote: > On 26 May 2014, at 18:16 , dinesh-ganapati notifications@github.com > wrote: > > > I am trying to run a simple radical pilot script on yellowstone. I am > > running an instance of mongodb on one of the login nodes. (the compute > > nodes are able to talk to mongodb) and I also see the compute units being > > logged. but for some reason when i do a bjobs i do not see the jobs > > submitted, and the , only one compute unit is launched and changed to > > pending active.(I am running with RADICAL_PILOT_VERBOSE=info and do not see > > an error ). > > Indeed, the pilot didn’t become active yet (I assume you meant pilot and > not compute unit). > > If you are sure that it was not queued (anymore), then something went > wrong indeed. > You might want to have a look $HOME/radical.pilot.sandbox/pilot-XXXX and > see if there are errors in the AGENT.LOG or stderr/stdout. > > — > Reply to this email directly or view it on GitHubhttps://github.com/radical-cybertools/radical.pilot/issues/145#issuecomment-44211050 > . > > — > Reply to this email directly or view it on GitHub.
dinesh-ganapati commented 10 years ago

Hi Mark , Here is the resource file i created:

{ "yellowstone.ucar.edu": { "URL" : "lsf+ssh://localhost", "filesystem" : "file://localhost", "default_queue" : "premium", "python_interpreter" : "/usr/bin/python", "pre_bootstrap" : ["module purge", "module load python/2.7.5","module load curl"], "task_launch_mode" : "SSH", "valid_roots" : ["/glade"], "bootstrapper" : "default_bootstrapper.sh" },

"yellowstone.ucar.edu-local": {
    "URL"                : "lsf://localhost",
    "filesystem"         : "file://localhost/",
    "default_queue"      : "premium",
    "task_launch_mode"   : "SSH",
    "pre_bootstrap"      : ["module purge", "module load

python/2.7.5","module load curl"], "python_interpreter" : "/usr/bin/python", "valid_roots" : ["/glade"], "bootstrapper" : "default_bootstrapper.sh" }

}

and here is my script:

import os import sys import time import radical.pilot

READ: The RADICAL-Pilot documentation:

http://radicalpilot.readthedocs.org/en/latest

#

Try running this example with RADICAL_PILOT_VERBOSE=debug set if

you want to see what happens behind the scences!

# WORKDIR = "/glade/scratch/dinesh/workdir/" NUMBER_JOBS =16 RCONF = ["file://localhost/glade/u/home/dinesh/yellowstone.json"]

DBURL defines the MongoDB server URL and has the format

mongodb://host:port.

For the installation of a MongoDB server, refer to http://docs.mongodb.org

. DBURL = 'mongodb://yslogin1:27017' #'mongodb:// ec2-184-72-89-141.compute-1.amazonaws.com:27017/'#os.getenv( "RADICAL_PILOT_DBURL") if DBURL is None: print "ERROR: RADICAL_PILOT_DBURL (MongoDB server URL) is not defined." sys.exit(1)

------------------------------------------------------------------------------

# def pilot_state_cb(pilot, state): """pilot_state_change_cb() is a callback function. It gets called very time a ComputePilot changes its state. """ print "[Callback]: ComputePilot '{0}' state changed to {1}.".format( pilot.uid, state)

if state == radical.pilot.states.FAILED:
    sys.exit(1)

------------------------------------------------------------------------------

# def unit_state_change_cb(unit, state): """unit_state_change_cb() is a callback function. It gets called very time a ComputeUnit changes its state. """ print "[Callback]: ComputeUnit '{0}' state changed to {1}.".format( unit.uid, state) if state == radical.pilot.states.FAILED: print " Log: %s" % unit.log[-1]

------------------------------------------------------------------------------

# if name == "main":

try:
    start_1=time.time()
    # Create a new session. A session is the 'root' object for all other
    # RADICAL-Pilot objects. It encapsualtes the MongoDB connection(s)

as

well as security crendetials.

    session = radical.pilot.Session(database_url=DBURL)
    # Add a Pilot Manager. Pilot managers manage one or more

ComputePilots. pmgr = radical.pilot.PilotManager(session=session, resource_configurations=RCONF)

    # Register our callback with the PilotManager. This callback will

get

called every time any of the pilots managed by the PilotManager

    # change their state.
    pmgr.register_callback(pilot_state_cb)
    # Define a N-core local pilot that runs for M minutes and cleans up
    # after itself.
    pdesc = radical.pilot.ComputePilotDescription()
    pdesc.resource = "yellowstone.ucar.edu-local"
    pdesc.runtime  = 10 # M minutes
    pdesc.cores    = 16  #N cores
    pdesc.cleanup  = True
    pdesc.sandbox  = WORKDIR
    pdesc.project  = "URTG0003"
    # Launch the pilot.
    pilot = pmgr.submit_pilots(pdesc)

    # Create a workload of 8 ComputeUnits (tasks). Each compute unit
    # uses /bin/cat to concatenate two input files, file1.dat and
    # file2.dat. The output is written to STDOUT. cu.environment is
    # used to demonstrate how to set environment variables withih a
    # ComputeUnit - it's not strictly necessary for this example. As
    # a shell script, the ComputeUnits would look something like this:
    #
    #    export INPUT1=file1.dat
    #    export INPUT2=file2.dat
    #    /bin/cat $INPUT1 $INPUT2
    #
    compute_units = []
    start_2=time.time()
    f=open('rad_pilot_timer_ys_'+str(NUMBER_JOBS)+'.txt','w')
    i=0
    for unit_count in range(0, NUMBER_JOBS):
        cu = radical.pilot.ComputeUnitDescription()
        cu.executable  = "bash"
        cu.arguments   =

['/glade/scratch/dinesh/data/try_rmsd_callingscript'+str(i)+'.sh'] cu.cores = 1 compute_units.append(cu) i=i+1

    # Combine the ComputePilot, the ComputeUnits and a scheduler via
    # a UnitManager object.
    umgr = radical.pilot.UnitManager(
        session=session,
        scheduler=radical.pilot.SCHED_DIRECT_SUBMISSION)

    # Register our callback with the UnitManager. This callback will get
    # called every time any of the units managed by the UnitManager
    # change their state.
    umgr.register_callback(unit_state_change_cb)

    # Add the previsouly created ComputePilot to the UnitManager.
    umgr.add_pilots(pilot)

    # Submit the previously created ComputeUnit descriptions to the
    # PilotManager. This will trigger the selected scheduler to start
    # assigning ComputeUnits to the ComputePilots.
    units = umgr.submit_units(compute_units)

    # Wait for all compute units to reach a terminal state (DONE or

FAILED). umgr.wait_units()

    for unit in units:
        print "* Task %s (executed @ %s) state %s, exit code: %s,

started: %s, finished: %s, stdout: %s" \ % (unit.uid, unit.execution_locations, unit.state, unit.exit_code, unit.start_time, unit.stop_time, unit.stdout)

    # Close automatically cancels the pilot(s).
    session.close()
    done = time.time()
    elapsed = done - start
    f.write(str(elapsed));
    f.write('\n')
    f.write(str(start_2-start_1))
    f.write('\n')
    f.write(str(start-start_2))
    f.close();
    sys.exit(0)

except radical.pilot.PilotException, ex:
    # Catch all exceptions and exit with and error.
    print "Error during execution: %s" % ex
    sys.exit(1)

Thanks and regards, Dinesh

On Tue, May 27, 2014 at 1:24 AM, Mark Santcroos notifications@github.comwrote:

Ok, so there is a bootstrapping issue.

What does your (minimal) script look like?

Did you construct a resource configuration? (As there is no default resource config for yellowstone)

On 26 May 2014, at 22:12 , dinesh-ganapati notifications@github.com wrote:

Hi Mark, Sorry I forgot to add. The contents of AGENT.STDERR were:

Lmod Warning: Did not find: curl (This was the same error recorded when I was trying to use 'mongodb:// ec2-184-72-89-141.compute-1.amazonaws.com:27017/ instead of the local instance of the mongodb)

The AGENT.STDOUT also documents the 'fail to do a module load curl' :

################################################################################

Running pre-bootstrapping command

CMDLINE: module purge

################################################################################

Running pre-bootstrapping command

CMDLINE: module load python/2.7.5

################################################################################

Running pre-bootstrapping command

CMDLINE: module load curl

Error running pre-boostrapping command! ABORTING


Sender: LSF System lsfadmin@ys3228-ib Subject: Job 124168: < #!/bin/bash ;#BSUB -cwd /glade/scratch/dinesh/workdir/pilot-538354b67749a65ebfa886a4 ;#BSUB -o

/glade/scratch/dinesh/workdir/pilot-538354b67749a65ebfa886a4/AGENT.STDOUT ;#BSUB -e

/glade/scratch/dinesh/workdir/pilot-538354b67749a65ebfa886a4/AGENT.STDERR ;#BSUB -W 0:10 ;#BSUB -q premium ;#BSUB -P URTG0003 ;#BSUB -n 16 ; /bin/bash -l -c "./default_bootstrapper.sh -r yslogin1:27017 -d radicalpilot -s 538354b67749a65ebfa886a2 -p 538354b67749a65ebfa886a4 -t 10 -c 16 -V 0.13 -l SSH -i /usr/bin/python -e 'module purge' -e 'module load python/2.7.5' -e 'module load curl' -C -a URTG0003 " > in cluster

Exited Job < #!/bin/bash ;#BSUB -cwd /glade/scratch/dinesh/workdir/pilot-538354b67749a65ebfa886a4 ;#BSUB -o /glade/scratch/dinesh/workdir/pilot-538354b67749a65ebfa886a4/AGENT.STDOUT ;#BSUB -e /glade/scratch/dinesh/workdir/pilot-538354b67749a65ebfa886a4/AGENT.STDERR ;#BSUB -W 0:10 ;#BSUB -q premium ;#BSUB -P URTG0003 ;#BSUB -n 16 ; /bin/bash -l -c "./default_bootstrapper.sh -r yslogin1:27017 -d radicalpilot -s 538354b67749a65ebfa886a2 -p 538354b67749a65ebfa886a4 -t 10 -c 16 -V 0.13 -l SSH -i /usr/bin/python -e 'module purge' -e 'module load python/2.7.5' -e 'module load curl' -C -a URTG0003 " > was submitted from host by user in cluster . Job was executed on host(s) <16*ys3228-ib>, in queue , as user in cluster . was used as the home directory. was used as the working directory. Started at Mon May 26 08:50:42 2014 Results reported at Mon May 26 08:50:44 2014 Your job looked like: --- # LSBATCH: User input #!/bin/bash #BSUB -cwd /glade/scratch/dinesh/workdir/pilot-538354b67749a65ebfa886a4 #BSUB -o /glade/scratch/dinesh/workdir/pilot-538354b67749a65ebfa886a4/AGENT.STDOUT #BSUB -e /glade/scratch/dinesh/workdir/pilot-538354b67749a65ebfa886a4/AGENT.STDERR #BSUB -W 0:10 #BSUB -q premium #BSUB -P URTG0003 #BSUB -n 16 /bin/bash -l -c "./default_bootstrapper.sh -r yslogin1:27017 -d radicalpilot -s 538354b67749a65ebfa886a2 -p 538354b67749a65ebfa886a4 -t 10 -c 16 -V 0.13 -l SSH -i /usr/bin/python -e 'module purge' -e 'module load python/2.7.5' -e 'module load curl' -C -a URTG0003 " --- Exited with exit code 1. On Mon, May 26, 2014 at 2:58 PM, Mark Santcroos < notifications@github.com>wrote: > On 26 May 2014, at 18:16 , dinesh-ganapati notifications@github.com > wrote: > > > I am trying to run a simple radical pilot script on yellowstone. I > > am > > running an instance of mongodb on one of the login nodes. (the compute > > nodes are able to talk to mongodb) and I also see the compute units > > being > > logged. but for some reason when i do a bjobs i do not see the jobs > > submitted, and the , only one compute unit is launched and changed to > > pending active.(I am running with RADICAL_PILOT_VERBOSE=info and do > > not see > > an error ). > > Indeed, the pilot didn’t become active yet (I assume you meant pilot > and > not compute unit). > > If you are sure that it was not queued (anymore), then something went > wrong indeed. > You might want to have a look $HOME/radical.pilot.sandbox/pilot-XXXX > and > see if there are errors in the AGENT.LOG or stderr/stdout. > > — > Reply to this email directly or view it on GitHub< > https://github.com/radical-cybertools/radical.pilot/issues/145#issuecomment-44211050> > > . > > — > Reply to this email directly or view it on GitHub.

— Reply to this email directly or view it on GitHubhttps://github.com/radical-cybertools/radical.pilot/issues/145#issuecomment-44236956 .

andre-merzky commented 10 years ago

Does the module load curl work if you run it manually on a headnode? And on a compute node?

dinesh-ganapati commented 10 years ago

Hi Andre, the manually running module load curl does not work (doing a module spider says module not available). I tried omitting module load curl in the json file and doing it too . Even in that case the AGENT.ERR reads curl: (7) couldn't connect to host

Dinesh

On Tue, May 27, 2014 at 1:38 PM, Andre Merzky notifications@github.comwrote:

Does the module load curl work if you run it manually on a headnode? And on a compute node?

— Reply to this email directly or view it on GitHubhttps://github.com/radical-cybertools/radical.pilot/issues/145#issuecomment-44308688 .

andre-merzky commented 10 years ago

if module load curl does not work, then that command should not be in the json file. But yes, then you are back to square 1 because curl does not exist. Does wget exist on the machines? Support for wget as curl alternative is probably something Ole could easily add?

marksantcroos commented 10 years ago

One step back, who “says” curl does not exist? The only thing I see mentioned by Dinesh is an error thrown by curl. Isn’t this a sign that there is no outbound connectivity from the worker nodes?

On 27 May 2014, at 19:54 , Andre Merzky notifications@github.com wrote:

if module load curl does not work, then that command should not be in the json file. But yes, then you are back to square 1 because curl does not exist. Does wget exist on the machines? Support for wget as curl alternative is probably something Ole could easily add?

— Reply to this email directly or view it on GitHub.

dinesh-ganapati commented 10 years ago

Hi Mark / Andre, I did speak to the CISL guys and they confirmed what Mark said, there is not outbound connectivity from the compute nodes. Also I tried submitting simple jobs to the lsf that would try to do both wget/curl (separately in different job submission scripts) and connection keeps going timed out.

Dinesh

On Tue, May 27, 2014 at 2:05 PM, Mark Santcroos notifications@github.comwrote:

One step back, who “says” curl does not exist? The only thing I see mentioned by Dinesh is an error thrown by curl. Isn’t this a sign that there is no outbound connectivity from the worker nodes?

On 27 May 2014, at 19:54 , Andre Merzky notifications@github.com wrote:

if module load curl does not work, then that command should not be in the json file. But yes, then you are back to square 1 because curl does not exist. Does wget exist on the machines? Support for wget as curl alternative is probably something Ole could easily add?

— Reply to this email directly or view it on GitHub.

— Reply to this email directly or view it on GitHubhttps://github.com/radical-cybertools/radical.pilot/issues/145#issuecomment-44312740 .

marksantcroos commented 10 years ago

Right, I guess this translates into the creation and/or upvoting of two issues:

On 27 May 2014, at 20:15 , dinesh-ganapati notifications@github.com wrote:

Hi Mark / Andre, I did speak to the CISL guys and they confirmed what Mark said, there is not outbound connectivity from the compute nodes. Also I tried submitting simple jobs to the lsf that would try to do both wget/curl (separately in different job submission scripts) and connection keeps going timed out.

Dinesh

On Tue, May 27, 2014 at 2:05 PM, Mark Santcroos notifications@github.comwrote:

One step back, who “says” curl does not exist? The only thing I see mentioned by Dinesh is an error thrown by curl. Isn’t this a sign that there is no outbound connectivity from the worker nodes?

On 27 May 2014, at 19:54 , Andre Merzky notifications@github.com wrote:

if module load curl does not work, then that command should not be in the json file. But yes, then you are back to square 1 because curl does not exist. Does wget exist on the machines? Support for wget as curl alternative is probably something Ole could easily add?

— Reply to this email directly or view it on GitHub.

— Reply to this email directly or view it on GitHubhttps://github.com/radical-cybertools/radical.pilot/issues/145#issuecomment-44312740 .

— Reply to this email directly or view it on GitHub.

dinesh-ganapati commented 10 years ago

Hi Mark , Does this mean at the moment there no work around to run radical pilot on yellowstone?

Dinesh

On Tue, May 27, 2014 at 2:21 PM, Mark Santcroos notifications@github.comwrote:

Right, I guess this translates into the creation and/or upvoting of two issues:

  • agent bootstrapping should not rely on downloads from the worker nodes
  • the need for a proxy to mongodb

On 27 May 2014, at 20:15 , dinesh-ganapati notifications@github.com wrote:

Hi Mark / Andre, I did speak to the CISL guys and they confirmed what Mark said, there is not outbound connectivity from the compute nodes. Also I tried submitting simple jobs to the lsf that would try to do both wget/curl (separately in different job submission scripts) and connection keeps going timed out.

Dinesh

On Tue, May 27, 2014 at 2:05 PM, Mark Santcroos < notifications@github.com>wrote:

One step back, who “says” curl does not exist? The only thing I see mentioned by Dinesh is an error thrown by curl. Isn’t this a sign that there is no outbound connectivity from the worker nodes?

On 27 May 2014, at 19:54 , Andre Merzky notifications@github.com wrote:

if module load curl does not work, then that command should not be in the json file. But yes, then you are back to square 1 because curl does not exist. Does wget exist on the machines? Support for wget as curl alternative is probably something Ole could easily add?

— Reply to this email directly or view it on GitHub.

— Reply to this email directly or view it on GitHub< https://github.com/radical-cybertools/radical.pilot/issues/145#issuecomment-44312740>

.

— Reply to this email directly or view it on GitHub.

— Reply to this email directly or view it on GitHubhttps://github.com/radical-cybertools/radical.pilot/issues/145#issuecomment-44315025 .

marksantcroos commented 10 years ago

The fastest workaround I see is:

  1. Run mongodb on the headnode: you do that already
  2. Run a proxy on the headnode and set the curl proxy environment variable so that curl uses that when running curl on the workernodes.

On 27 May 2014, at 20:42 , dinesh-ganapati notifications@github.com wrote:

Hi Mark , Does this mean at the moment there no work around to run radical pilot on yellowstone?

Dinesh

On Tue, May 27, 2014 at 2:21 PM, Mark Santcroos notifications@github.comwrote:

Right, I guess this translates into the creation and/or upvoting of two issues:

  • agent bootstrapping should not rely on downloads from the worker nodes
  • the need for a proxy to mongodb

On 27 May 2014, at 20:15 , dinesh-ganapati notifications@github.com wrote:

Hi Mark / Andre, I did speak to the CISL guys and they confirmed what Mark said, there is not outbound connectivity from the compute nodes. Also I tried submitting simple jobs to the lsf that would try to do both wget/curl (separately in different job submission scripts) and connection keeps going timed out.

Dinesh

On Tue, May 27, 2014 at 2:05 PM, Mark Santcroos < notifications@github.com>wrote:

One step back, who “says” curl does not exist? The only thing I see mentioned by Dinesh is an error thrown by curl. Isn’t this a sign that there is no outbound connectivity from the worker nodes?

On 27 May 2014, at 19:54 , Andre Merzky notifications@github.com wrote:

if module load curl does not work, then that command should not be in the json file. But yes, then you are back to square 1 because curl does not exist. Does wget exist on the machines? Support for wget as curl alternative is probably something Ole could easily add?

— Reply to this email directly or view it on GitHub.

— Reply to this email directly or view it on GitHub< https://github.com/radical-cybertools/radical.pilot/issues/145#issuecomment-44312740>

.

— Reply to this email directly or view it on GitHub.

— Reply to this email directly or view it on GitHubhttps://github.com/radical-cybertools/radical.pilot/issues/145#issuecomment-44315025 .

— Reply to this email directly or view it on GitHub.

dinesh-ganapati commented 10 years ago

Hi Mark, I can try that, can you suggest any documentation I could use to complete point 2?

Thanks, Dinesh

On Tue, May 27, 2014 at 2:47 PM, Mark Santcroos notifications@github.comwrote:

The fastest workaround I see is:

  1. Run mongodb on the headnode: you do that already
  2. Run a proxy on the headnode and set the curl proxy environment variable so that curl uses that when running curl on the workernodes.

On 27 May 2014, at 20:42 , dinesh-ganapati notifications@github.com wrote:

Hi Mark , Does this mean at the moment there no work around to run radical pilot on yellowstone?

Dinesh

On Tue, May 27, 2014 at 2:21 PM, Mark Santcroos < notifications@github.com>wrote:

Right, I guess this translates into the creation and/or upvoting of two issues:

  • agent bootstrapping should not rely on downloads from the worker nodes
  • the need for a proxy to mongodb

On 27 May 2014, at 20:15 , dinesh-ganapati notifications@github.com wrote:

Hi Mark / Andre, I did speak to the CISL guys and they confirmed what Mark said, there is not outbound connectivity from the compute nodes. Also I tried submitting simple jobs to the lsf that would try to do both wget/curl (separately in different job submission scripts) and connection keeps going timed out.

Dinesh

On Tue, May 27, 2014 at 2:05 PM, Mark Santcroos < notifications@github.com>wrote:

One step back, who “says” curl does not exist? The only thing I see mentioned by Dinesh is an error thrown by curl. Isn’t this a sign that there is no outbound connectivity from the worker nodes?

On 27 May 2014, at 19:54 , Andre Merzky notifications@github.com

wrote:

if module load curl does not work, then that command should not be in the json file. But yes, then you are back to square 1 because curl does not exist. Does wget exist on the machines? Support for wget as curl alternative is probably something Ole could easily add?

— Reply to this email directly or view it on GitHub.

— Reply to this email directly or view it on GitHub<

https://github.com/radical-cybertools/radical.pilot/issues/145#issuecomment-44312740>

.

— Reply to this email directly or view it on GitHub.

— Reply to this email directly or view it on GitHub< https://github.com/radical-cybertools/radical.pilot/issues/145#issuecomment-44315025>

.

— Reply to this email directly or view it on GitHub.

— Reply to this email directly or view it on GitHubhttps://github.com/radical-cybertools/radical.pilot/issues/145#issuecomment-44318449 .

marksantcroos commented 10 years ago
  1. Create an ssh connection to yellowstone
  2. Create a local proxy using ssh: "ssh -D *:12345 localhost" (or any available port for that matter)
  3. You need this setting to become active on the worker node: export ALL_PROXY=socks5://ysloginX:12345 (either through your .bashrc or through the bootstrap mechanism, replace X with the right login node) X. As a bonus, if you extend the ssh command as: "ssh -D :12345 -L :12346:ec2-184-72-89-141.compute-1.amazonaws.com:27017 localhost”, you also don’t need your own mongodb anymore.

I confirmed that I can download using curl and access the central mongodb from a worker node. Trying this all out with RP is left as an exercise for the reader.

On 27 May 2014, at 20:58 , dinesh-ganapati notifications@github.com wrote:

Hi Mark, I can try that, can you suggest any documentation I could use to complete point 2?

Thanks, Dinesh

On Tue, May 27, 2014 at 2:47 PM, Mark Santcroos notifications@github.comwrote:

The fastest workaround I see is:

  1. Run mongodb on the headnode: you do that already
  2. Run a proxy on the headnode and set the curl proxy environment variable so that curl uses that when running curl on the workernodes.

On 27 May 2014, at 20:42 , dinesh-ganapati notifications@github.com wrote:

Hi Mark , Does this mean at the moment there no work around to run radical pilot on yellowstone?

Dinesh

On Tue, May 27, 2014 at 2:21 PM, Mark Santcroos < notifications@github.com>wrote:

Right, I guess this translates into the creation and/or upvoting of two issues:

  • agent bootstrapping should not rely on downloads from the worker nodes
  • the need for a proxy to mongodb

On 27 May 2014, at 20:15 , dinesh-ganapati notifications@github.com wrote:

Hi Mark / Andre, I did speak to the CISL guys and they confirmed what Mark said, there is not outbound connectivity from the compute nodes. Also I tried submitting simple jobs to the lsf that would try to do both wget/curl (separately in different job submission scripts) and connection keeps going timed out.

Dinesh

On Tue, May 27, 2014 at 2:05 PM, Mark Santcroos < notifications@github.com>wrote:

One step back, who “says” curl does not exist? The only thing I see mentioned by Dinesh is an error thrown by curl. Isn’t this a sign that there is no outbound connectivity from the worker nodes?

On 27 May 2014, at 19:54 , Andre Merzky notifications@github.com

wrote:

if module load curl does not work, then that command should not be in the json file. But yes, then you are back to square 1 because curl does not exist. Does wget exist on the machines? Support for wget as curl alternative is probably something Ole could easily add?

— Reply to this email directly or view it on GitHub.

— Reply to this email directly or view it on GitHub<

https://github.com/radical-cybertools/radical.pilot/issues/145#issuecomment-44312740>

.

— Reply to this email directly or view it on GitHub.

— Reply to this email directly or view it on GitHub< https://github.com/radical-cybertools/radical.pilot/issues/145#issuecomment-44315025>

.

— Reply to this email directly or view it on GitHub.

— Reply to this email directly or view it on GitHubhttps://github.com/radical-cybertools/radical.pilot/issues/145#issuecomment-44318449 .

— Reply to this email directly or view it on GitHub.

dinesh-ganapati commented 10 years ago

Hi Mark, I tried to follow the steps.

  1. made an SSH connection to yellowstone after that did an ssh to yslogin1
  2. instead of this I did the point marked X
  3. did it (typed it on the terminal and executed it) and replaced ysloginX with yslogin1 but there is no .bashrc file on ys as the default shell is tsch and i have converted it to bash via a weblink (So, I am using bash but there is no .bashrc file). Am not sure if there is some other file i could add this to.

changed the DBURL in my script to DBURL ='mongodb:// ec2-184-72-89-141.compute-1.amazonaws.com:27017/' and ran the script.

I still get the curl: (7) couldn't connect to host error. I am sure I am doing something wrong, but cant seem to be able to figure out what.

Thanks and regards, Dinesh

On Tue, May 27, 2014 at 5:26 PM, Mark Santcroos notifications@github.comwrote:

  1. Create an ssh connection to yellowstone
  2. Create a local proxy using ssh: "ssh -D *:12345 localhost" (or any available port for that matter)
  3. You need this setting to become active on the worker node: export ALL_PROXY=socks5://ysloginX:12345 (either through your .bashrc or through the bootstrap mechanism, replace X with the right login node) X. As a bonus, if you extend the ssh command as: "ssh -D :12345 -L :12346:ec2-184-72-89-141.compute-1.amazonaws.com:27017 localhost”, you also don’t need your own mongodb anymore.

I confirmed that I can download using curl and access the central mongodb from a worker node. Trying this all out with RP is left as an exercise for the reader.

On 27 May 2014, at 20:58 , dinesh-ganapati notifications@github.com wrote:

Hi Mark, I can try that, can you suggest any documentation I could use to complete point 2?

Thanks, Dinesh

On Tue, May 27, 2014 at 2:47 PM, Mark Santcroos < notifications@github.com>wrote:

The fastest workaround I see is:

  1. Run mongodb on the headnode: you do that already
  2. Run a proxy on the headnode and set the curl proxy environment variable so that curl uses that when running curl on the workernodes.

On 27 May 2014, at 20:42 , dinesh-ganapati notifications@github.com wrote:

Hi Mark , Does this mean at the moment there no work around to run radical pilot on yellowstone?

Dinesh

On Tue, May 27, 2014 at 2:21 PM, Mark Santcroos < notifications@github.com>wrote:

Right, I guess this translates into the creation and/or upvoting of two issues:

  • agent bootstrapping should not rely on downloads from the worker nodes
  • the need for a proxy to mongodb

On 27 May 2014, at 20:15 , dinesh-ganapati < notifications@github.com> wrote:

Hi Mark / Andre, I did speak to the CISL guys and they confirmed what Mark said, there is not outbound connectivity from the compute nodes. Also I tried submitting simple jobs to the lsf that would try to do both wget/curl (separately in different job submission scripts) and connection keeps going timed out.

Dinesh

On Tue, May 27, 2014 at 2:05 PM, Mark Santcroos < notifications@github.com>wrote:

One step back, who “says” curl does not exist? The only thing I see mentioned by Dinesh is an error thrown by curl. Isn’t this a sign that there is no outbound connectivity from the worker nodes?

On 27 May 2014, at 19:54 , Andre Merzky < notifications@github.com>

wrote:

if module load curl does not work, then that command should not be in the json file. But yes, then you are back to square 1 because curl does not exist. Does wget exist on the machines? Support for wget as curl alternative is probably something Ole could easily add?

— Reply to this email directly or view it on GitHub.

— Reply to this email directly or view it on GitHub<

https://github.com/radical-cybertools/radical.pilot/issues/145#issuecomment-44312740>

.

— Reply to this email directly or view it on GitHub.

— Reply to this email directly or view it on GitHub<

https://github.com/radical-cybertools/radical.pilot/issues/145#issuecomment-44315025>

.

— Reply to this email directly or view it on GitHub.

— Reply to this email directly or view it on GitHub< https://github.com/radical-cybertools/radical.pilot/issues/145#issuecomment-44318449>

.

— Reply to this email directly or view it on GitHub.

— Reply to this email directly or view it on GitHubhttps://github.com/radical-cybertools/radical.pilot/issues/145#issuecomment-44337712 .

marksantcroos commented 10 years ago

On 28 May 2014, at 22:15 , dinesh-ganapati notifications@github.com wrote:

I tried to follow the steps.

  1. made an SSH connection to yellowstone after that did an ssh to yslogin1

Ok.

  1. instead of this I did the point marked X

Good.

  1. did it (typed it on the terminal and executed it) and replaced ysloginX with yslogin1 but there is no .bashrc file on ys as the default shell is tsch and i have converted it to bash via a weblink (So, I am using bash but there is no .bashrc file). Am not sure if there is some other file i could add this to.

You can just create that file.

changed the DBURL in my script to DBURL ='mongodb:// ec2-184-72-89-141.compute-1.amazonaws.com:27017/' and ran the script.

I still get the curl: (7) couldn't connect to host error. I am sure I am doing something wrong, but cant seem to be able to figure out what.

Ah, I forgot to make explicit that you should use: DBURL="mongodb://ysloginX:12346”

dinesh-ganapati commented 10 years ago

Hi Mark, I definitely see more than before, the bjobs shows the job being queued as well, but I still see the same curl: (7) couldn't connect to host error. Here is what I see.

2014:05:28 15:29:14 radical.pilot.MainProcess: [INFO ] radical.pilot version: 0.13 2014:05:28 15:29:15 radical.pilot.MainProcess: [INFO ] New Session created{'database_url': 'mongodb://yslogin1:12346', 'database_name': 'radicalpilot', 'last_reconnect': None, 'uid': '5386552a7749a67455702ed9', 'created': datetime.datetime(2014, 5, 28, 21, 29, 14, 595658)}. 2014:05:28 15:29:19 radical.pilot.MainProcess: [INFO ] Loaded scheduler: DirectSubmissionScheduler. 2014:05:28 15:29:19 radical.pilot.PilotLauncherWorker-1: [INFO ] Launching ComputePilot {u'state': u'PendingLaunch', u'description': {u'project': u'URTG0003', u'resource': u'yellowstone.ucar.edu-local', u'queue': None, u'sandbox': u'/glade/scratch/dinesh/workdir/', u'cleanup': True, u'pilot_agent_priv': None, u'cores': 16, u'runtime': 10}, u'sagajobid': None, u'started': None, u'cores_per_node': None, u'output_transfer_started': None, u'sandbox': u'file://localhost/glade/scratch/dinesh/workdir/pilot-5386552f7749a67455702edb/', u'submitted': datetime.datetime(2014, 5, 28, 21, 29, 19, 322000), u'unitmanager': u'5386552f7749a67455702edc', u'output_transfer_finished': None, u'finished': None, u'pilotmanager': u'5386552c7749a67455702eda', u'command': None, u'wu_queue': [], u'statehistory': [{u'timestamp': datetime.datetime(2014, 5, 28, 21, 29, 19, 322000), u'state': u'PendingLaunch'}], u'input_transfer_started': None, u'_id': ObjectId('5386552f7749a67455702edb'), u'input_transfer_finished': None, u'nodes': None, u'log': []} 2014:05:28 15:29:20 radical.pilot.MainProcess: [INFO ] ComputePilot '5386552f7749a67455702edb' state changed from 'PendingLaunch' to 'Launching'. [Callback]: ComputePilot '5386552f7749a67455702edb' state changed to Launching. 2014:05:28 15:29:20 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '5386552f7749a67455702edd' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '5386552f7749a67455702edd' state changed to PendingExecution. 2014:05:28 15:29:20 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '5386552f7749a67455702ede' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '5386552f7749a67455702ede' state changed to PendingExecution. 2014:05:28 15:29:20 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '5386552f7749a67455702edf' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '5386552f7749a67455702edf' state changed to PendingExecution. 2014:05:28 15:29:20 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '5386552f7749a67455702ee0' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '5386552f7749a67455702ee0' state changed to PendingExecution. 2014:05:28 15:29:20 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '5386552f7749a67455702ee1' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '5386552f7749a67455702ee1' state changed to PendingExecution. 2014:05:28 15:29:20 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '5386552f7749a67455702ee2' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '5386552f7749a67455702ee2' state changed to PendingExecution. 2014:05:28 15:29:20 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '5386552f7749a67455702ee3' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '5386552f7749a67455702ee3' state changed to PendingExecution. 2014:05:28 15:29:20 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '5386552f7749a67455702ee4' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '5386552f7749a67455702ee4' state changed to PendingExecution. 2014:05:28 15:29:20 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '5386552f7749a67455702ee5' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '5386552f7749a67455702ee5' state changed to PendingExecution. 2014:05:28 15:29:20 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '5386552f7749a67455702ee6' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '5386552f7749a67455702ee6' state changed to PendingExecution. 2014:05:28 15:29:20 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '5386552f7749a67455702ee7' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '5386552f7749a67455702ee7' state changed to PendingExecution. 2014:05:28 15:29:20 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '5386552f7749a67455702ee8' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '5386552f7749a67455702ee8' state changed to PendingExecution. 2014:05:28 15:29:20 radical.pilot.MainProcess: [INFO ] Scheduled ComputeUnits ['5386552f7749a67455702edd', '5386552f7749a67455702ede', '5386552f7749a67455702edf', '5386552f7749a67455702ee0', '5386552f7749a67455702ee1', '5386552f7749a67455702ee2', '5386552f7749a67455702ee3', '5386552f7749a67455702ee4', '5386552f7749a67455702ee5', '5386552f7749a67455702ee6', '5386552f7749a67455702ee7', '5386552f7749a67455702ee8', '5386552f7749a67455702ee9', '5386552f7749a67455702eea', '5386552f7749a67455702eeb', '5386552f7749a67455702eec'] for execution on ComputePilot '5386552f7749a67455702edb'. 2014:05:28 15:29:21 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '5386552f7749a67455702ee9' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '5386552f7749a67455702ee9' state changed to PendingExecution. 2014:05:28 15:29:21 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '5386552f7749a67455702eea' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '5386552f7749a67455702eea' state changed to PendingExecution. 2014:05:28 15:29:21 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '5386552f7749a67455702eeb' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '5386552f7749a67455702eeb' state changed to PendingExecution. 2014:05:28 15:29:21 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '5386552f7749a67455702eec' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '5386552f7749a67455702eec' state changed to PendingExecution. 2014:05:28 15:29:26 radical.pilot.MainProcess: [INFO ] ComputePilot '5386552f7749a67455702edb' state changed from 'Launching' to 'PendingActive'. [Callback]: ComputePilot '5386552f7749a67455702edb' state changed to PendingActive.

On Wed, May 28, 2014 at 5:21 PM, Mark Santcroos notifications@github.comwrote:

On 28 May 2014, at 22:15 , dinesh-ganapati notifications@github.com wrote:

I tried to follow the steps.

  1. made an SSH connection to yellowstone after that did an ssh to yslogin1

Ok.

  1. instead of this I did the point marked X

Good.

  1. did it (typed it on the terminal and executed it) and replaced ysloginX with yslogin1 but there is no .bashrc file on ys as the default shell is tsch and i have converted it to bash via a weblink (So, I am using bash but there is no .bashrc file). Am not sure if there is some other file i could add this to.

You can just create that file.

changed the DBURL in my script to DBURL ='mongodb:// ec2-184-72-89-141.compute-1.amazonaws.com:27017/' and ran the script.

I still get the curl: (7) couldn't connect to host error. I am sure I am doing something wrong, but cant seem to be able to figure out what.

Ah, I forgot to make explicit that you should use: DBURL="mongodb://ysloginX:12346”

— Reply to this email directly or view it on GitHubhttps://github.com/radical-cybertools/radical.pilot/issues/145#issuecomment-44466135 .

marksantcroos commented 10 years ago

So you now have export ALL_PROXY=socks5://ysloginX:12345 in .bashrc?

Can you put the same string in .profile and try again?

On 28 May 2014, at 23:33 , dinesh-ganapati notifications@github.com wrote:

Hi Mark, I definitely see more than before, the bjobs shows the job being queued as well, but I still see the same curl: (7) couldn't connect to host error. Here is what I see.

2014:05:28 15:29:14 radical.pilot.MainProcess: [INFO ] radical.pilot version: 0.13 2014:05:28 15:29:15 radical.pilot.MainProcess: [INFO ] New Session created{'database_url': 'mongodb://yslogin1:12346', 'database_name': 'radicalpilot', 'last_reconnect': None, 'uid': '5386552a7749a67455702ed9', 'created': datetime.datetime(2014, 5, 28, 21, 29, 14, 595658)}. 2014:05:28 15:29:19 radical.pilot.MainProcess: [INFO ] Loaded scheduler: DirectSubmissionScheduler. 2014:05:28 15:29:19 radical.pilot.PilotLauncherWorker-1: [INFO ] Launching ComputePilot {u'state': u'PendingLaunch', u'description': {u'project': u'URTG0003', u'resource': u'yellowstone.ucar.edu-local', u'queue': None, u'sandbox': u'/glade/scratch/dinesh/workdir/', u'cleanup': True, u'pilot_agent_priv': None, u'cores': 16, u'runtime': 10}, u'sagajobid': None, u'started': None, u'cores_per_node': None, u'output_transfer_started': None, u'sandbox': u'file://localhost/glade/scratch/dinesh/workdir/pilot-5386552f7749a67455702edb/', u'submitted': datetime.datetime(2014, 5, 28, 21, 29, 19, 322000), u'unitmanager': u'5386552f7749a67455702edc', u'output_transfer_finished': None, u'finished': None, u'pilotmanager': u'5386552c7749a67455702eda', u'command': None, u'wu_queue': [], u'statehistory': [{u'timestamp': datetime.datetime(2014, 5, 28, 21, 29, 19, 322000), u'state': u'PendingLaunch'}], u'input_transfer_started': None, u'_id': ObjectId('5386552f7749a67455702edb'), u'input_transfer_finished': None, u'nodes': None, u'log': []} 2014:05:28 15:29:20 radical.pilot.MainProcess: [INFO ] ComputePilot '5386552f7749a67455702edb' state changed from 'PendingLaunch' to 'Launching'. [Callback]: ComputePilot '5386552f7749a67455702edb' state changed to Launching. 2014:05:28 15:29:20 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '5386552f7749a67455702edd' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '5386552f7749a67455702edd' state changed to PendingExecution. 2014:05:28 15:29:20 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '5386552f7749a67455702ede' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '5386552f7749a67455702ede' state changed to PendingExecution. 2014:05:28 15:29:20 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '5386552f7749a67455702edf' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '5386552f7749a67455702edf' state changed to PendingExecution. 2014:05:28 15:29:20 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '5386552f7749a67455702ee0' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '5386552f7749a67455702ee0' state changed to PendingExecution. 2014:05:28 15:29:20 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '5386552f7749a67455702ee1' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '5386552f7749a67455702ee1' state changed to PendingExecution. 2014:05:28 15:29:20 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '5386552f7749a67455702ee2' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '5386552f7749a67455702ee2' state changed to PendingExecution. 2014:05:28 15:29:20 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '5386552f7749a67455702ee3' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '5386552f7749a67455702ee3' state changed to PendingExecution. 2014:05:28 15:29:20 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '5386552f7749a67455702ee4' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '5386552f7749a67455702ee4' state changed to PendingExecution. 2014:05:28 15:29:20 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '5386552f7749a67455702ee5' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '5386552f7749a67455702ee5' state changed to PendingExecution. 2014:05:28 15:29:20 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '5386552f7749a67455702ee6' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '5386552f7749a67455702ee6' state changed to PendingExecution. 2014:05:28 15:29:20 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '5386552f7749a67455702ee7' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '5386552f7749a67455702ee7' state changed to PendingExecution. 2014:05:28 15:29:20 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '5386552f7749a67455702ee8' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '5386552f7749a67455702ee8' state changed to PendingExecution. 2014:05:28 15:29:20 radical.pilot.MainProcess: [INFO ] Scheduled ComputeUnits ['5386552f7749a67455702edd', '5386552f7749a67455702ede', '5386552f7749a67455702edf', '5386552f7749a67455702ee0', '5386552f7749a67455702ee1', '5386552f7749a67455702ee2', '5386552f7749a67455702ee3', '5386552f7749a67455702ee4', '5386552f7749a67455702ee5', '5386552f7749a67455702ee6', '5386552f7749a67455702ee7', '5386552f7749a67455702ee8', '5386552f7749a67455702ee9', '5386552f7749a67455702eea', '5386552f7749a67455702eeb', '5386552f7749a67455702eec'] for execution on ComputePilot '5386552f7749a67455702edb'. 2014:05:28 15:29:21 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '5386552f7749a67455702ee9' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '5386552f7749a67455702ee9' state changed to PendingExecution. 2014:05:28 15:29:21 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '5386552f7749a67455702eea' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '5386552f7749a67455702eea' state changed to PendingExecution. 2014:05:28 15:29:21 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '5386552f7749a67455702eeb' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '5386552f7749a67455702eeb' state changed to PendingExecution. 2014:05:28 15:29:21 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '5386552f7749a67455702eec' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '5386552f7749a67455702eec' state changed to PendingExecution. 2014:05:28 15:29:26 radical.pilot.MainProcess: [INFO ] ComputePilot '5386552f7749a67455702edb' state changed from 'Launching' to 'PendingActive'. [Callback]: ComputePilot '5386552f7749a67455702edb' state changed to PendingActive.

On Wed, May 28, 2014 at 5:21 PM, Mark Santcroos notifications@github.comwrote:

On 28 May 2014, at 22:15 , dinesh-ganapati notifications@github.com wrote:

I tried to follow the steps.

  1. made an SSH connection to yellowstone after that did an ssh to yslogin1

Ok.

  1. instead of this I did the point marked X

Good.

  1. did it (typed it on the terminal and executed it) and replaced ysloginX with yslogin1 but there is no .bashrc file on ys as the default shell is tsch and i have converted it to bash via a weblink (So, I am using bash but there is no .bashrc file). Am not sure if there is some other file i could add this to.

You can just create that file.

changed the DBURL in my script to DBURL ='mongodb:// ec2-184-72-89-141.compute-1.amazonaws.com:27017/' and ran the script.

I still get the curl: (7) couldn't connect to host error. I am sure I am doing something wrong, but cant seem to be able to figure out what.

Ah, I forgot to make explicit that you should use: DBURL="mongodb://ysloginX:12346”

— Reply to this email directly or view it on GitHubhttps://github.com/radical-cybertools/radical.pilot/issues/145#issuecomment-44466135 .

— Reply to this email directly or view it on GitHub.

dinesh-ganapati commented 10 years ago

Hi Mark, Sorry for the late reply, I had tried this as well and the AGENT.ERR yeilds : curl: (56) Proxy CONNECT aborted the job is not queued on to the lsf. But i do see this following:

2014:06:05 11:08:12 radical.pilot.MainProcess: [INFO ] radical.pilot version: 0.13 2014:06:05 11:08:14 radical.pilot.MainProcess: [INFO ] New Session created{'database_url': 'mongodb://yslogin1:12346', 'database_name': 'radicalpilot', 'last_reconnect': None, 'uid': '5390a3fd260bdc215a70db6e', 'created': datetime.datetime(2014, 6, 5, 17, 8, 13, 479763)}. 2014:06:05 11:08:16 radical.pilot.PilotLauncherWorker-1: [INFO ] Launching ComputePilot {u'state': u'PendingLaunch', u'description': {u'project': u'URTG0003', u'resource': u'yellowstone.ucar.edu-local', u'queue': None, u'sandbox': u'/glade/scratch/dinesh/workdir/', u'cleanup': True, u'pilot_agent_priv': None, u'cores': 16, u'runtime': 10}, u'sagajobid': None, u'started': None, u'cores_per_node': None, u'output_transfer_started': None, u'finished': None, u'submitted': datetime.datetime(2014, 6, 5, 17, 8, 16, 776000), u'output_transfer_finished': None, u'sandbox': u'file://localhost/glade/scratch/dinesh/workdir/pilot-5390a400260bdc215a70db70/', u'pilotmanager': u'5390a3fe260bdc215a70db6f', u'unitmanager': None, u'statehistory': [{u'timestamp': datetime.datetime(2014, 6, 5, 17, 8, 16, 775000), u'state': u'PendingLaunch'}], u'wu_queue': [], u'command': None, u'input_transfer_started': None, u'_id': ObjectId('5390a400260bdc215a70db70'), u'input_transfer_finished': None, u'nodes': None, u'log': []} 2014:06:05 11:08:17 radical.pilot.MainProcess: [INFO ] Loaded scheduler: DirectSubmissionScheduler. 2014:06:05 11:08:17 radical.pilot.MainProcess: [INFO ] ComputePilot '5390a400260bdc215a70db70' state changed from 'PendingLaunch' to 'Launching'. [Callback]: ComputePilot '5390a400260bdc215a70db70' state changed to Launching. 2014:06:05 11:08:18 radical.pilot.MainProcess: [INFO ] Scheduled ComputeUnits ['5390a401260bdc215a70db72', '5390a401260bdc215a70db73', '5390a401260bdc215a70db74', '5390a401260bdc215a70db75', '5390a401260bdc215a70db76', '5390a401260bdc215a70db77', '5390a401260bdc215a70db78', '5390a401260bdc215a70db79', '5390a401260bdc215a70db7a', '5390a401260bdc215a70db7b', '5390a401260bdc215a70db7c', '5390a401260bdc215a70db7d', '5390a401260bdc215a70db7e', '5390a401260bdc215a70db7f', '5390a401260bdc215a70db80', '5390a401260bdc215a70db81'] for execution on ComputePilot '5390a400260bdc215a70db70'. 2014:06:05 11:08:18 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '5390a401260bdc215a70db72' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '5390a401260bdc215a70db72' state changed to PendingExecution. 2014:06:05 11:08:18 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '5390a401260bdc215a70db73' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '5390a401260bdc215a70db73' state changed to PendingExecution. 2014:06:05 11:08:18 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '5390a401260bdc215a70db74' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '5390a401260bdc215a70db74' state changed to PendingExecution. 2014:06:05 11:08:18 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '5390a401260bdc215a70db75' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '5390a401260bdc215a70db75' state changed to PendingExecution. 2014:06:05 11:08:18 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '5390a401260bdc215a70db76' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '5390a401260bdc215a70db76' state changed to PendingExecution. 2014:06:05 11:08:18 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '5390a401260bdc215a70db77' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '5390a401260bdc215a70db77' state changed to PendingExecution. 2014:06:05 11:08:18 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '5390a401260bdc215a70db78' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '5390a401260bdc215a70db78' state changed to PendingExecution. 2014:06:05 11:08:18 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '5390a401260bdc215a70db79' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '5390a401260bdc215a70db79' state changed to PendingExecution. 2014:06:05 11:08:18 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '5390a401260bdc215a70db7a' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '5390a401260bdc215a70db7a' state changed to PendingExecution. 2014:06:05 11:08:18 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '5390a401260bdc215a70db7b' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '5390a401260bdc215a70db7b' state changed to PendingExecution. 2014:06:05 11:08:18 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '5390a401260bdc215a70db7c' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '5390a401260bdc215a70db7c' state changed to PendingExecution. 2014:06:05 11:08:18 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '5390a401260bdc215a70db7d' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '5390a401260bdc215a70db7d' state changed to PendingExecution. 2014:06:05 11:08:18 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '5390a401260bdc215a70db7e' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '5390a401260bdc215a70db7e' state changed to PendingExecution. 2014:06:05 11:08:18 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '5390a401260bdc215a70db7f' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '5390a401260bdc215a70db7f' state changed to PendingExecution. 2014:06:05 11:08:18 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '5390a401260bdc215a70db80' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '5390a401260bdc215a70db80' state changed to PendingExecution. 2014:06:05 11:08:18 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '5390a401260bdc215a70db81' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '5390a401260bdc215a70db81' state changed to PendingExecution. 2014:06:05 11:08:22 radical.pilot.MainProcess: [INFO ] ComputePilot '5390a400260bdc215a70db70' state changed from 'Launching' to 'PendingActive'. [Callback]: ComputePilot '5390a400260bdc215a70db70' state changed to PendingActive.

Thanks and regards, Dinesh

On Wed, May 28, 2014 at 5:47 PM, Mark Santcroos notifications@github.com wrote:

So you now have export ALL_PROXY=socks5://ysloginX:12345 in .bashrc?

Can you put the same string in .profile and try again?

On 28 May 2014, at 23:33 , dinesh-ganapati notifications@github.com wrote:

Hi Mark, I definitely see more than before, the bjobs shows the job being queued as well, but I still see the same curl: (7) couldn't connect to host error. Here is what I see.

2014:05:28 15:29:14 radical.pilot.MainProcess: [INFO ] radical.pilot version: 0.13 2014:05:28 15:29:15 radical.pilot.MainProcess: [INFO ] New Session created{'database_url': 'mongodb://yslogin1:12346', 'database_name': 'radicalpilot', 'last_reconnect': None, 'uid': '5386552a7749a67455702ed9', 'created': datetime.datetime(2014, 5, 28, 21, 29, 14, 595658)}. 2014:05:28 15:29:19 radical.pilot.MainProcess: [INFO ] Loaded scheduler: DirectSubmissionScheduler. 2014:05:28 15:29:19 radical.pilot.PilotLauncherWorker-1: [INFO ] Launching ComputePilot {u'state': u'PendingLaunch', u'description': {u'project': u'URTG0003', u'resource': u'yellowstone.ucar.edu-local', u'queue': None, u'sandbox': u'/glade/scratch/dinesh/workdir/', u'cleanup': True, u'pilot_agent_priv': None, u'cores': 16, u'runtime': 10}, u'sagajobid': None, u'started': None, u'cores_per_node': None, u'output_transfer_started': None, u'sandbox':

u'file://localhost/glade/scratch/dinesh/workdir/pilot-5386552f7749a67455702edb/',

u'submitted': datetime.datetime(2014, 5, 28, 21, 29, 19, 322000), u'unitmanager': u'5386552f7749a67455702edc', u'output_transfer_finished': None, u'finished': None, u'pilotmanager': u'5386552c7749a67455702eda', u'command': None, u'wu_queue': [], u'statehistory': [{u'timestamp': datetime.datetime(2014, 5, 28, 21, 29, 19, 322000), u'state': u'PendingLaunch'}], u'input_transfer_started': None, u'_id': ObjectId('5386552f7749a67455702edb'), u'input_transfer_finished': None, u'nodes': None, u'log': []} 2014:05:28 15:29:20 radical.pilot.MainProcess: [INFO ] ComputePilot '5386552f7749a67455702edb' state changed from 'PendingLaunch' to 'Launching'. [Callback]: ComputePilot '5386552f7749a67455702edb' state changed to Launching. 2014:05:28 15:29:20 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '5386552f7749a67455702edd' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '5386552f7749a67455702edd' state changed to PendingExecution. 2014:05:28 15:29:20 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '5386552f7749a67455702ede' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '5386552f7749a67455702ede' state changed to PendingExecution. 2014:05:28 15:29:20 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '5386552f7749a67455702edf' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '5386552f7749a67455702edf' state changed to PendingExecution. 2014:05:28 15:29:20 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '5386552f7749a67455702ee0' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '5386552f7749a67455702ee0' state changed to PendingExecution. 2014:05:28 15:29:20 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '5386552f7749a67455702ee1' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '5386552f7749a67455702ee1' state changed to PendingExecution. 2014:05:28 15:29:20 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '5386552f7749a67455702ee2' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '5386552f7749a67455702ee2' state changed to PendingExecution. 2014:05:28 15:29:20 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '5386552f7749a67455702ee3' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '5386552f7749a67455702ee3' state changed to PendingExecution. 2014:05:28 15:29:20 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '5386552f7749a67455702ee4' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '5386552f7749a67455702ee4' state changed to PendingExecution. 2014:05:28 15:29:20 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '5386552f7749a67455702ee5' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '5386552f7749a67455702ee5' state changed to PendingExecution. 2014:05:28 15:29:20 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '5386552f7749a67455702ee6' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '5386552f7749a67455702ee6' state changed to PendingExecution. 2014:05:28 15:29:20 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '5386552f7749a67455702ee7' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '5386552f7749a67455702ee7' state changed to PendingExecution. 2014:05:28 15:29:20 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '5386552f7749a67455702ee8' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '5386552f7749a67455702ee8' state changed to PendingExecution. 2014:05:28 15:29:20 radical.pilot.MainProcess: [INFO ] Scheduled ComputeUnits ['5386552f7749a67455702edd', '5386552f7749a67455702ede', '5386552f7749a67455702edf', '5386552f7749a67455702ee0', '5386552f7749a67455702ee1', '5386552f7749a67455702ee2', '5386552f7749a67455702ee3', '5386552f7749a67455702ee4', '5386552f7749a67455702ee5', '5386552f7749a67455702ee6', '5386552f7749a67455702ee7', '5386552f7749a67455702ee8', '5386552f7749a67455702ee9', '5386552f7749a67455702eea', '5386552f7749a67455702eeb', '5386552f7749a67455702eec'] for execution on ComputePilot '5386552f7749a67455702edb'. 2014:05:28 15:29:21 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '5386552f7749a67455702ee9' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '5386552f7749a67455702ee9' state changed to PendingExecution. 2014:05:28 15:29:21 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '5386552f7749a67455702eea' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '5386552f7749a67455702eea' state changed to PendingExecution. 2014:05:28 15:29:21 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '5386552f7749a67455702eeb' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '5386552f7749a67455702eeb' state changed to PendingExecution. 2014:05:28 15:29:21 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '5386552f7749a67455702eec' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '5386552f7749a67455702eec' state changed to PendingExecution. 2014:05:28 15:29:26 radical.pilot.MainProcess: [INFO ] ComputePilot '5386552f7749a67455702edb' state changed from 'Launching' to 'PendingActive'. [Callback]: ComputePilot '5386552f7749a67455702edb' state changed to PendingActive.

On Wed, May 28, 2014 at 5:21 PM, Mark Santcroos < notifications@github.com>wrote:

On 28 May 2014, at 22:15 , dinesh-ganapati notifications@github.com wrote:

I tried to follow the steps.

  1. made an SSH connection to yellowstone after that did an ssh to yslogin1

Ok.

  1. instead of this I did the point marked X

Good.

  1. did it (typed it on the terminal and executed it) and replaced ysloginX with yslogin1 but there is no .bashrc file on ys as the default shell is tsch and i have converted it to bash via a weblink (So, I am using bash but there is no .bashrc file). Am not sure if there is some other file i could add this to.

You can just create that file.

changed the DBURL in my script to DBURL ='mongodb:// ec2-184-72-89-141.compute-1.amazonaws.com:27017/' and ran the script.

I still get the curl: (7) couldn't connect to host error. I am sure I am doing something wrong, but cant seem to be able to figure out what.

Ah, I forgot to make explicit that you should use: DBURL="mongodb://ysloginX:12346”

— Reply to this email directly or view it on GitHub< https://github.com/radical-cybertools/radical.pilot/issues/145#issuecomment-44466135>

.

— Reply to this email directly or view it on GitHub.

— Reply to this email directly or view it on GitHub https://github.com/radical-cybertools/radical.pilot/issues/145#issuecomment-44468936 .

marksantcroos commented 10 years ago

You might want to give tests/yellowstone_test.py in devel branch a try. I've daemonized an ssh tunnel on a headnode, let's see if we can use that as a "service", in that case you might not need to change anything.

dinesh-ganapati commented 10 years ago

Hi Mark, Should i change the DBURL to DBURL="mongodb://yslogin1:12346” ? ('i have retained the changes I have made in the .bashrc and .profile files) if I do that i get an error:

Traceback (most recent call last): File "ys_test.py", line 57, in session = radical.pilot.Session(database_url=DBURL) File "/glade/u/home/dinesh/rp13/lib/python2.7/site-packages/radical.pilot-0.13-py2.7.egg/radical/pilot/session.py", line 138, in init db_name=database_name) File "/glade/u/home/dinesh/rp13/lib/python2.7/site-packages/radical.pilot-0.13-py2.7.egg/radical/pilot/db/database.py", line 80, in new dbs = Session(db_url, db_name) File "/glade/u/home/dinesh/rp13/lib/python2.7/site-packages/radical.pilot-0.13-py2.7.egg/radical/pilot/db/database.py", line 59, in init self._client = MongoClient(db_url) File "build/bdist.linux-x86_64/egg/pymongo/mongo_client.py", line 366, in init pymongo.errors.ConnectionFailure: [Errno 111] Connection refused

else if i dont change the DBURL, ERROR: RADICAL_PILOT_DBURL (MongoDB server URL) is not defined.

Thanks and regards, Dinesh

On Fri, Jun 6, 2014 at 6:56 AM, Mark Santcroos notifications@github.com wrote:

You might want to give tests/yellowstone_test.py in devel branch a try. I've daemonized an ssh tunnel on a headnode, let's see if we can use that as a "service", in that case you might not need to change anything.

— Reply to this email directly or view it on GitHub https://github.com/radical-cybertools/radical.pilot/issues/145#issuecomment-45324866 .

marksantcroos commented 10 years ago

Hi,

On 06 Jun 2014, at 13:53 , dinesh-ganapati notifications@github.com wrote:

Should i change the DBURL to DBURL="mongodb://yslogin1:12346” ?

Depends where you run, if you run it from your laptop (lsf+ssh), it just needs to point to the EC2 one. If you run from the headnode, it might work both ways. (I only tested the former.

('i have retained the changes I have made in the .bashrc and .profile files) if I do that i get an error:

You don’t need any specifics, so it would better if you disable all of that so that you have a clean slate.

Gr,

Mark

File "ys_test.py", line 57, in session = radical.pilot.Session(database_url=DBURL) File "/glade/u/home/dinesh/rp13/lib/python2.7/site-packages/radical.pilot-0.13-py2.7.egg/radical/pilot/session.py", line 138, in init db_name=database_name) File "/glade/u/home/dinesh/rp13/lib/python2.7/site-packages/radical.pilot-0.13-py2.7.egg/radical/pilot/db/database.py", line 80, in new dbs = Session(db_url, db_name) File "/glade/u/home/dinesh/rp13/lib/python2.7/site-packages/radical.pilot-0.13-py2.7.egg/radical/pilot/db/database.py", line 59, in init self._client = MongoClient(db_url) File "build/bdist.linux-x86_64/egg/pymongo/mongo_client.py", line 366, in init pymongo.errors.ConnectionFailure: [Errno 111] Connection refused

else if i dont change the DBURL, ERROR: RADICAL_PILOT_DBURL (MongoDB server URL) is not defined.

Thanks and regards, Dinesh

On Fri, Jun 6, 2014 at 6:56 AM, Mark Santcroos notifications@github.com wrote:

You might want to give tests/yellowstone_test.py in devel branch a try. I've daemonized an ssh tunnel on a headnode, let's see if we can use that as a "service", in that case you might not need to change anything.

— Reply to this email directly or view it on GitHub https://github.com/radical-cybertools/radical.pilot/issues/145#issuecomment-45324866 .

— Reply to this email directly or view it on GitHub.

dinesh-ganapati commented 10 years ago

Hi Mark, I am trying to run it from the head node. I got rid of the path exports in the .bashrc and the .profile. I still get the ERROR: RADICAL_PILOT_DBURL (MongoDB server URL) is not defined. I have created a fresh virtualenv and have installed the lastest version of radical pilot from the devel version.

Thanks and regards, Dinesh

On Fri, Jun 6, 2014 at 8:47 AM, Mark Santcroos notifications@github.com wrote:

Hi,

On 06 Jun 2014, at 13:53 , dinesh-ganapati notifications@github.com wrote:

Should i change the DBURL to DBURL="mongodb://yslogin1:12346” ?

Depends where you run, if you run it from your laptop (lsf+ssh), it just needs to point to the EC2 one. If you run from the headnode, it might work both ways. (I only tested the former.

('i have retained the changes I have made in the .bashrc and .profile files) if I do that i get an error:

You don’t need any specifics, so it would better if you disable all of that so that you have a clean slate.

Gr,

Mark

File "ys_test.py", line 57, in session = radical.pilot.Session(database_url=DBURL) File

"/glade/u/home/dinesh/rp13/lib/python2.7/site-packages/radical.pilot-0.13-py2.7.egg/radical/pilot/session.py",

line 138, in init db_name=database_name) File

"/glade/u/home/dinesh/rp13/lib/python2.7/site-packages/radical.pilot-0.13-py2.7.egg/radical/pilot/db/database.py",

line 80, in new dbs = Session(db_url, db_name) File

"/glade/u/home/dinesh/rp13/lib/python2.7/site-packages/radical.pilot-0.13-py2.7.egg/radical/pilot/db/database.py",

line 59, in init self._client = MongoClient(db_url) File "build/bdist.linux-x86_64/egg/pymongo/mongo_client.py", line 366, in init pymongo.errors.ConnectionFailure: [Errno 111] Connection refused

else if i dont change the DBURL, ERROR: RADICAL_PILOT_DBURL (MongoDB server URL) is not defined.

Thanks and regards, Dinesh

On Fri, Jun 6, 2014 at 6:56 AM, Mark Santcroos notifications@github.com

wrote:

You might want to give tests/yellowstone_test.py in devel branch a try. I've daemonized an ssh tunnel on a headnode, let's see if we can use that as a "service", in that case you might not need to change anything.

— Reply to this email directly or view it on GitHub < https://github.com/radical-cybertools/radical.pilot/issues/145#issuecomment-45324866>

.

— Reply to this email directly or view it on GitHub.

— Reply to this email directly or view it on GitHub https://github.com/radical-cybertools/radical.pilot/issues/145#issuecomment-45332583 .

marksantcroos commented 10 years ago

On 06 Jun 2014, at 15:02 , dinesh-ganapati notifications@github.com wrote:

I am trying to run it from the head node. I got rid of the path exports in the .bashrc and the .profile. I still get the ERROR: RADICAL_PILOT_DBURL (MongoDB server URL) is not defined.

You have to set it of course.

dinesh-ganapati commented 10 years ago

Oops Sorry Mark, made haste in asking that question. :-) , Got it, it works now ! I'll try to re-write my script using this one as a skeleton, and see how it goes.

Thanks and regards, Dinesh

On Fri, Jun 6, 2014 at 9:04 AM, Mark Santcroos notifications@github.com wrote:

On 06 Jun 2014, at 15:02 , dinesh-ganapati notifications@github.com wrote:

I am trying to run it from the head node. I got rid of the path exports in the .bashrc and the .profile. I still get the ERROR: RADICAL_PILOT_DBURL (MongoDB server URL) is not defined.

You have to set it of course.

— Reply to this email directly or view it on GitHub https://github.com/radical-cybertools/radical.pilot/issues/145#issuecomment-45334005 .

marksantcroos commented 10 years ago

On 06 Jun 2014, at 15:20 , dinesh-ganapati notifications@github.com wrote:

Got it, it works now !

Great!

I'll try to re-write my script using this one as a skeleton, and see how it goes.

Shouldnt be too much, most of it is (hidden) in the new resource config.

So I only tested /bin/echo. What type of programs do you need to run?

dinesh-ganapati commented 10 years ago

Hi Mark, I am only running a bash command. Within the the bash script I am doing a few installs and running a python script. I will test these out in a few hours from now, I have been experiencing some downtime on ys. I have called them up, system should be up by this afternoon. I'll post back my progress soon!

Thanks and regards, Dinesh

On Fri, Jun 6, 2014 at 11:18 AM, Mark Santcroos notifications@github.com wrote:

On 06 Jun 2014, at 15:20 , dinesh-ganapati notifications@github.com wrote:

Got it, it works now !

Great!

I'll try to re-write my script using this one as a skeleton, and see how it goes.

Shouldnt be too much, most of it is (hidden) in the new resource config.

So I only tested /bin/echo. What type of programs do you need to run?

— Reply to this email directly or view it on GitHub https://github.com/radical-cybertools/radical.pilot/issues/145#issuecomment-45348469 .

marksantcroos commented 10 years ago

There is scheduled (and announced) maintenance on YS today indeed.

On 09 Jun 2014, at 16:28 , dinesh-ganapati notifications@github.com wrote:

Hi Mark, I am only running a bash command. Within the the bash script I am doing a few installs and running a python script. I will test these out in a few hours from now, I have been experiencing some downtime on ys. I have called them up, system should be up by this afternoon. I'll post back my progress soon!

Thanks and regards, Dinesh

On Fri, Jun 6, 2014 at 11:18 AM, Mark Santcroos notifications@github.com wrote:

On 06 Jun 2014, at 15:20 , dinesh-ganapati notifications@github.com wrote:

Got it, it works now !

Great!

I'll try to re-write my script using this one as a skeleton, and see how it goes.

Shouldnt be too much, most of it is (hidden) in the new resource config.

So I only tested /bin/echo. What type of programs do you need to run?

— Reply to this email directly or view it on GitHub https://github.com/radical-cybertools/radical.pilot/issues/145#issuecomment-45348469 .

— Reply to this email directly or view it on GitHub.

dinesh-ganapati commented 10 years ago

Hi Mark, I tried just added pdesc.sandbox = WORKDIR and defined the path for WORKDIR and also added some time probes and changed the cu.executable to bash. I get the following error in the AGENT.STDERR . I am also added my script, I did a compare diff to check if I had made significant changes from yours. (I have mentioned the only ones I have made) Do let me know if I am missing something. % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0curl: (7) Failed to connect to pypi.python.org port 443: Connection refused

and the RADICAL_PILOT_VERBOSE=info yeilds, despite this happening i don't see the job being queued. I

        2014:06:09 21:03:13 radical.pilot.MainProcess: [INFO    ]

radical.pilot version: 0.14 2014:06:09 21:03:16 radical.pilot.MainProcess: [INFO ] New Session created{'database_url': 'mongodb:// ec2-184-72-89-141.compute-1.amazonaws.com:27017/', 'database_name': 'radicalpilot', 'last_reconnect': None, 'uid': '53967573260bdc63139092cb', 'created': datetime.datetime(2014, 6, 10, 3, 3, 15, 553738)}. 2014:06:09 21:03:18 radical.pilot.MainProcess: [INFO ] Added credential {'user_id': None, 'type': 'SSH', 'user_pass': None, 'user_key': None} to session 53967573260bdc63139092cb. 2014:06:09 21:03:18 radical.pilot.MainProcess: [INFO ] Loaded resource configurations from file://localhost//glade/u/home/dinesh/rp13/lib/python2.6/site-packages/radical.pilot-0.14-py2.6.egg/radical/pilot/configs/archer.json 2014:06:09 21:03:18 radical.pilot.MainProcess: [INFO ] Loaded resource configurations from file://localhost//glade/u/home/dinesh/rp13/lib/python2.6/site-packages/radical.pilot-0.14-py2.6.egg/radical/pilot/configs/futuregrid.json 2014:06:09 21:03:18 radical.pilot.MainProcess: [INFO ] Loaded resource configurations from file://localhost//glade/u/home/dinesh/rp13/lib/python2.6/site-packages/radical.pilot-0.14-py2.6.egg/radical/pilot/configs/localhost.json 2014:06:09 21:03:18 radical.pilot.MainProcess: [INFO ] Loaded resource configurations from file://localhost//glade/u/home/dinesh/rp13/lib/python2.6/site-packages/radical.pilot-0.14-py2.6.egg/radical/pilot/configs/ncar.json 2014:06:09 21:03:18 radical.pilot.MainProcess: [INFO ] Loaded resource configurations from file://localhost//glade/u/home/dinesh/rp13/lib/python2.6/site-packages/radical.pilot-0.14-py2.6.egg/radical/pilot/configs/xsede.json 2014:06:09 21:03:19 radical.pilot.MainProcess: [INFO ] Loaded scheduler: DirectSubmissionScheduler. 2014:06:09 21:03:19 radical.pilot.PilotLauncherWorker-1: [INFO ] Launching ComputePilot {u'state': u'PendingLaunch', u'description': {u'project': u'URTG0003', u'resource': u'yellowstone.ucar.edu', u'queue': None, u'sandbox': u'/glade/scratch/dinesh/workdir/', u'cleanup': True, u'pilot_agent_priv': None, u'cores': 16, u'runtime': 15}, u'sagajobid': None, u'started': None, u'cores_per_node': None, u'output_transfer_started': None, u'finished': None, u'submitted': datetime.datetime(2014, 6, 10, 3, 3, 19, 374000), u'output_transfer_finished': None, u'sandbox': u'sftp:// yellowstone.ucar.edu/glade/scratch/dinesh/workdir/pilot-53967577260bdc63139092cd/', u'pilotmanager': u'53967576260bdc63139092cc', u'unitmanager': None, u'statehistory': [{u'timestamp': datetime.datetime(2014, 6, 10, 3, 3, 19, 374000), u'state': u'PendingLaunch'}], u'wu_queue': [], u'command': None, u'input_transfer_started': None, u'_id': ObjectId('53967577260bdc63139092cd'), u'input_transfer_finished': None, u'nodes': None, u'log': []} 2014:06:09 21:03:19 radical.pilot.MainProcess: [INFO ] ComputePilot '53967577260bdc63139092cd' state changed from 'PendingLaunch' to 'Launching'. [Callback]: ComputePilot '53967577260bdc63139092cd' state changed to Launching. 2014:06:09 21:03:20 radical.pilot.MainProcess: [INFO ] Scheduled ComputeUnits ['53967577260bdc63139092cf', '53967577260bdc63139092d0', '53967577260bdc63139092d1', '53967577260bdc63139092d2', '53967577260bdc63139092d3', '53967577260bdc63139092d4', '53967577260bdc63139092d5', '53967577260bdc63139092d6', '53967577260bdc63139092d7', '53967577260bdc63139092d8', '53967577260bdc63139092d9', '53967577260bdc63139092da', '53967577260bdc63139092db', '53967577260bdc63139092dc', '53967577260bdc63139092dd', '53967577260bdc63139092de'] for execution on ComputePilot '53967577260bdc63139092cd'. 2014:06:09 21:03:20 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '53967577260bdc63139092cf' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '53967577260bdc63139092cf' state changed to PendingExecution. 2014:06:09 21:03:20 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '53967577260bdc63139092d0' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '53967577260bdc63139092d0' state changed to PendingExecution. 2014:06:09 21:03:20 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '53967577260bdc63139092d1' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '53967577260bdc63139092d1' state changed to PendingExecution. 2014:06:09 21:03:20 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '53967577260bdc63139092d2' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '53967577260bdc63139092d2' state changed to PendingExecution. 2014:06:09 21:03:20 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '53967577260bdc63139092d3' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '53967577260bdc63139092d3' state changed to PendingExecution. 2014:06:09 21:03:20 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '53967577260bdc63139092d4' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '53967577260bdc63139092d4' state changed to PendingExecution. 2014:06:09 21:03:20 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '53967577260bdc63139092d5' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '53967577260bdc63139092d5' state changed to PendingExecution. 2014:06:09 21:03:20 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '53967577260bdc63139092d6' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '53967577260bdc63139092d6' state changed to PendingExecution. 2014:06:09 21:03:20 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '53967577260bdc63139092d7' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '53967577260bdc63139092d7' state changed to PendingExecution. 2014:06:09 21:03:20 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '53967577260bdc63139092d8' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '53967577260bdc63139092d8' state changed to PendingExecution. 2014:06:09 21:03:20 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '53967577260bdc63139092d9' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '53967577260bdc63139092d9' state changed to PendingExecution. 2014:06:09 21:03:20 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '53967577260bdc63139092da' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '53967577260bdc63139092da' state changed to PendingExecution. 2014:06:09 21:03:20 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '53967577260bdc63139092db' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '53967577260bdc63139092db' state changed to PendingExecution. 2014:06:09 21:03:20 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '53967577260bdc63139092dc' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '53967577260bdc63139092dc' state changed to PendingExecution. 2014:06:09 21:03:20 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '53967577260bdc63139092dd' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '53967577260bdc63139092dd' state changed to PendingExecution. 2014:06:09 21:03:20 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '53967577260bdc63139092de' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '53967577260bdc63139092de' state changed to PendingExecution. 2014:06:09 21:03:28 radical.pilot.MainProcess: [INFO ] ComputePilot '53967577260bdc63139092cd' state changed from 'Launching' to 'PendingActive'. [Callback]: ComputePilot '53967577260bdc63139092cd' state changed to PendingActive.

Thanks and regards, Dinesh

On Mon, Jun 9, 2014 at 11:18 AM, Mark Santcroos notifications@github.com wrote:

There is scheduled (and announced) maintenance on YS today indeed.

On 09 Jun 2014, at 16:28 , dinesh-ganapati notifications@github.com wrote:

Hi Mark, I am only running a bash command. Within the the bash script I am doing a few installs and running a python script. I will test these out in a few hours from now, I have been experiencing some downtime on ys. I have called them up, system should be up by this afternoon. I'll post back my progress soon!

Thanks and regards, Dinesh

On Fri, Jun 6, 2014 at 11:18 AM, Mark Santcroos < notifications@github.com> wrote:

On 06 Jun 2014, at 15:20 , dinesh-ganapati notifications@github.com wrote:

Got it, it works now !

Great!

I'll try to re-write my script using this one as a skeleton, and see how it goes.

Shouldnt be too much, most of it is (hidden) in the new resource config.

So I only tested /bin/echo. What type of programs do you need to run?

— Reply to this email directly or view it on GitHub < https://github.com/radical-cybertools/radical.pilot/issues/145#issuecomment-45348469>

.

— Reply to this email directly or view it on GitHub.

— Reply to this email directly or view it on GitHub https://github.com/radical-cybertools/radical.pilot/issues/145#issuecomment-45502356 .

marksantcroos commented 10 years ago

On 10 Jun 2014, at 5:19 , dinesh-ganapati notifications@github.com wrote:

I tried just added pdesc.sandbox = WORKDIR

sandbox is optional and doesn’t need to be set on yellowstone I believe.

and defined the path for WORKDIR

What do you mean with that?

and also added some time probes and changed the cu.executable to bash.

(but also see above)

I get the following error in the AGENT.STDERR . I am also added my script, I did a compare diff to check if I had made significant changes from yours. (I have mentioned the only ones I have made) Do let me know if I am missing something. % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0curl: (7) Failed to connect to pypi.python.org port 443: Connection refused

This is because of the yellowstone maintenance, the tunnel got broken down. I’ve restored that now.

Can you please try again?

dinesh-ganapati commented 10 years ago

Hi Mark, I meant i set WORKDIR ="/glade/scratch/dinesh/workdir/" . I see that the job is submitted to the lsf. I am adding the script just in case.

THE AGENT.STDERR this time reads:

% Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 1957k 100 1957k 0 0 625k 0 0:00:03 0:00:03 --:--:-- 634k warning: no files found matching '.html' under directory 'docs' warning: no previously-included files matching '.txt' found under directory 'docs/_build' no previously-included directories found matching 'docs/_build/_sources' zip_safe flag not set; analyzing archive contents... unable to execute gcc: No such file or directory /tmp/easy_install-1sqibg/pymongo-2.7.1/setup.py:217: UserWarning:


WARNING: The bson._cbson extension module could not be compiled. No C extensions are essential for PyMongo to run, although they do result in significant speed improvements. The output above this warning shows how the compilation failed.

Please see the installation docs for solutions to build issues:

http://api.mongodb.org/python/current/installation.html

Here are some hints for popular operating systems:

If you are seeing this message on Linux you probably need to install GCC and/or the Python development package for your version of Python.

Debian and Ubuntu users should issue the following command:

$ sudo apt-get install build-essential python-dev

Users of Red Hat based distributions (RHEL, CentOS, Amazon Linux, Oracle Linux, Fedora, etc.) should issue the following command:

$ sudo yum install gcc python-devel

If you are seeing this message on Microsoft Windows please install PyMongo using the MS Windows installer for your version of Python, available on pypi here:

http://pypi.python.org/pypi/pymongo/#downloads

If you are seeing this message on OSX please read the documentation here:

http://api.mongodb.org/python/current/installation.html#osx


"The output above " unable to execute gcc: No such file or directory /tmp/easy_install-1sqibg/pymongo-2.7.1/setup.py:217: UserWarning:


WARNING: The pymongo._cmessage extension module could not be compiled. No C extensions are essential for PyMongo to run, although they do result in significant speed improvements. The output above this warning shows how the compilation failed.

Please see the installation docs for solutions to build issues:

http://api.mongodb.org/python/current/installation.html

Here are some hints for popular operating systems:

If you are seeing this message on Linux you probably need to install GCC and/or the Python development package for your version of Python.

Debian and Ubuntu users should issue the following command:

$ sudo apt-get install build-essential python-dev

Users of Red Hat based distributions (RHEL, CentOS, Amazon Linux, Oracle Linux, Fedora, etc.) should issue the following command:

$ sudo yum install gcc python-devel

If you are seeing this message on Microsoft Windows please install PyMongo using the MS Windows installer for your version of Python, available on pypi here:

http://pypi.python.org/pypi/pymongo/#downloads

If you are seeing this message on OSX please read the documentation here:

http://api.mongodb.org/python/current/installation.html#osx


"The output above " zip_safe flag not set; analyzing archive contents...

RADICAL_PILOT_VERBOSE=info yeilds:

2014:06:10 04:37:30 radical.pilot.MainProcess: [INFO ] radical.pilot version: 0.14 2014:06:10 04:37:32 radical.pilot.MainProcess: [INFO ] New Session created{'database_url': 'mongodb:// ec2-184-72-89-141.compute-1.amazonaws.com:27017/', 'database_name': 'radicalpilot', 'last_reconnect': None, 'uid': '5396dfeb28a6397ee74ccaf8', 'created': datetime.datetime(2014, 6, 10, 10, 37, 31, 859454)}. 2014:06:10 04:37:34 radical.pilot.MainProcess: [INFO ] Added credential {'user_id': None, 'type': 'SSH', 'user_pass': None, 'user_key': None} to session 5396dfeb28a6397ee74ccaf8. 2014:06:10 04:37:34 radical.pilot.MainProcess: [INFO ] Loaded resource configurations from file://localhost//glade/u/home/dinesh/rp13/lib/python2.6/site-packages/radical.pilot-0.14-py2.6.egg/radical/pilot/configs/archer.json 2014:06:10 04:37:34 radical.pilot.MainProcess: [INFO ] Loaded resource configurations from file://localhost//glade/u/home/dinesh/rp13/lib/python2.6/site-packages/radical.pilot-0.14-py2.6.egg/radical/pilot/configs/futuregrid.json 2014:06:10 04:37:34 radical.pilot.MainProcess: [INFO ] Loaded resource configurations from file://localhost//glade/u/home/dinesh/rp13/lib/python2.6/site-packages/radical.pilot-0.14-py2.6.egg/radical/pilot/configs/localhost.json 2014:06:10 04:37:34 radical.pilot.MainProcess: [INFO ] Loaded resource configurations from file://localhost//glade/u/home/dinesh/rp13/lib/python2.6/site-packages/radical.pilot-0.14-py2.6.egg/radical/pilot/configs/ncar.json 2014:06:10 04:37:34 radical.pilot.MainProcess: [INFO ] Loaded resource configurations from file://localhost//glade/u/home/dinesh/rp13/lib/python2.6/site-packages/radical.pilot-0.14-py2.6.egg/radical/pilot/configs/xsede.json 2014:06:10 04:37:35 radical.pilot.PilotLauncherWorker-1: [INFO ] Launching ComputePilot {u'state': u'PendingLaunch', u'description': {u'project': u'URTG0003', u'resource': u'yellowstone.ucar.edu', u'queue': None, u'sandbox': u'/glade/scratch/dinesh/workdir/', u'cleanup': True, u'pilot_agent_priv': None, u'cores': 16, u'runtime': 15}, u'sagajobid': None, u'started': None, u'cores_per_node': None, u'output_transfer_started': None, u'finished': None, u'submitted': datetime.datetime(2014, 6, 10, 10, 37, 35, 533000), u'output_transfer_finished': None, u'sandbox': u'sftp:// yellowstone.ucar.edu/glade/scratch/dinesh/workdir/pilot-5396dfef28a6397ee74ccafa/', u'pilotmanager': u'5396dfee28a6397ee74ccaf9', u'unitmanager': None, u'statehistory': [{u'timestamp': datetime.datetime(2014, 6, 10, 10, 37, 35, 532000), u'state': u'PendingLaunch'}], u'wu_queue': [], u'command': None, u'input_transfer_started': None, u'_id': ObjectId('5396dfef28a6397ee74ccafa'), u'input_transfer_finished': None, u'nodes': None, u'log': []} 2014:06:10 04:37:35 radical.pilot.MainProcess: [INFO ] Loaded scheduler: DirectSubmissionScheduler. 2014:06:10 04:37:36 radical.pilot.MainProcess: [INFO ] ComputePilot '5396dfef28a6397ee74ccafa' state changed from 'PendingLaunch' to 'Launching'. [Callback]: ComputePilot '5396dfef28a6397ee74ccafa' state changed to Launching. 2014:06:10 04:37:36 radical.pilot.MainProcess: [INFO ] Scheduled ComputeUnits ['5396dff028a6397ee74ccafc', '5396dff028a6397ee74ccafd', '5396dff028a6397ee74ccafe', '5396dff028a6397ee74ccaff', '5396dff028a6397ee74ccb00', '5396dff028a6397ee74ccb01', '5396dff028a6397ee74ccb02', '5396dff028a6397ee74ccb03', '5396dff028a6397ee74ccb04', '5396dff028a6397ee74ccb05', '5396dff028a6397ee74ccb06', '5396dff028a6397ee74ccb07', '5396dff028a6397ee74ccb08', '5396dff028a6397ee74ccb09', '5396dff028a6397ee74ccb0a', '5396dff028a6397ee74ccb0b'] for execution on ComputePilot '5396dfef28a6397ee74ccafa'. 2014:06:10 04:37:36 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '5396dff028a6397ee74ccafc' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '5396dff028a6397ee74ccafc' state changed to PendingExecution. 2014:06:10 04:37:36 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '5396dff028a6397ee74ccafd' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '5396dff028a6397ee74ccafd' state changed to PendingExecution. 2014:06:10 04:37:36 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '5396dff028a6397ee74ccafe' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '5396dff028a6397ee74ccafe' state changed to PendingExecution. 2014:06:10 04:37:36 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '5396dff028a6397ee74ccaff' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '5396dff028a6397ee74ccaff' state changed to PendingExecution. 2014:06:10 04:37:36 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '5396dff028a6397ee74ccb00' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '5396dff028a6397ee74ccb00' state changed to PendingExecution. 2014:06:10 04:37:36 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '5396dff028a6397ee74ccb01' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '5396dff028a6397ee74ccb01' state changed to PendingExecution. 2014:06:10 04:37:36 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '5396dff028a6397ee74ccb02' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '5396dff028a6397ee74ccb02' state changed to PendingExecution. 2014:06:10 04:37:36 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '5396dff028a6397ee74ccb03' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '5396dff028a6397ee74ccb03' state changed to PendingExecution. 2014:06:10 04:37:36 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '5396dff028a6397ee74ccb04' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '5396dff028a6397ee74ccb04' state changed to PendingExecution. 2014:06:10 04:37:36 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '5396dff028a6397ee74ccb05' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '5396dff028a6397ee74ccb05' state changed to PendingExecution. 2014:06:10 04:37:36 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '5396dff028a6397ee74ccb06' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '5396dff028a6397ee74ccb06' state changed to PendingExecution. 2014:06:10 04:37:36 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '5396dff028a6397ee74ccb07' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '5396dff028a6397ee74ccb07' state changed to PendingExecution. 2014:06:10 04:37:36 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '5396dff028a6397ee74ccb08' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '5396dff028a6397ee74ccb08' state changed to PendingExecution. 2014:06:10 04:37:36 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '5396dff028a6397ee74ccb09' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '5396dff028a6397ee74ccb09' state changed to PendingExecution. 2014:06:10 04:37:36 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '5396dff028a6397ee74ccb0a' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '5396dff028a6397ee74ccb0a' state changed to PendingExecution. 2014:06:10 04:37:36 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '5396dff028a6397ee74ccb0b' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '5396dff028a6397ee74ccb0b' state changed to PendingExecution. 2014:06:10 04:37:43 radical.pilot.MainProcess: [INFO ] ComputePilot '5396dfef28a6397ee74ccafa' state changed from 'Launching' to 'PendingActive'. [Callback]: ComputePilot '5396dfef28a6397ee74ccafa' state changed to PendingActive.

Thanks and regards, Dinesh

On Tue, Jun 10, 2014 at 2:03 AM, Mark Santcroos notifications@github.com wrote:

On 10 Jun 2014, at 5:19 , dinesh-ganapati notifications@github.com wrote:

I tried just added pdesc.sandbox = WORKDIR

sandbox is optional and doesn’t need to be set on yellowstone I believe.

and defined the path for WORKDIR

What do you mean with that?

and also added some time probes and changed the cu.executable to bash.

(but also see above)

I get the following error in the AGENT.STDERR . I am also added my script, I did a compare diff to check if I had made significant changes from yours. (I have mentioned the only ones I have made) Do let me know if I am missing something. % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0curl: (7) Failed to connect to pypi.python.org port 443: Connection refused

This is because of the yellowstone maintenance, the tunnel got broken down. I’ve restored that now.

Can you please try again?

— Reply to this email directly or view it on GitHub https://github.com/radical-cybertools/radical.pilot/issues/145#issuecomment-45576861 .

marksantcroos commented 10 years ago

Can you try again, the tunnel was on the wrong headnode. I just verified that the canonical example worked for me (tests/yellowstone_test.py). You might want to make sure that works for you too before you continue.

marksantcroos commented 10 years ago

On 09 Jun 2014, at 16:28 , dinesh-ganapati notifications@github.com wrote:

I am only running a bash command. Within the the bash script I am doing a few installs and running a python script.

To get back to this question, I meant, what (type of) applications are you planning to run on Yellowstone for the project?

dinesh-ganapati commented 10 years ago

Hi Mark, I am trying to run an ensemble of trajectory analysis tasks using MDAnalysis python library and computing rmsd of each frame with respect to the first frame for an mdcrd file.

I ran your canonical example from tests/yellowstone and it worked. Also this time when I ran my own script I did see the job being submitted to the lsf, and also the unit folders are created within the pilot folder in the workdir (even though the execution stage seems to happen for quite a bit of time I see that the rms output file has nothing in it and eventually the unit folders are all deleted (i guess that's due to cancelling of the pilots)).

RADICAL_PILOT_VERBOSE reads: (rp13)-bash-4.1$ RADICAL_PILOT_VERBOSE=info python ys_test_1.py 2014:06:10 06:36:22 radical.pilot.MainProcess: [INFO ] radical.pilot version: 0.14 2014:06:10 06:36:23 radical.pilot.MainProcess: [INFO ] New Session created{'database_url': 'mongodb:// ec2-184-72-89-141.compute-1.amazonaws.com:27017/', 'database_name': 'radicalpilot', 'last_reconnect': None, 'uid': '5396fbc63f795e53e7108f2b', 'created': datetime.datetime(2014, 6, 10, 12, 36, 22, 955433)}. 2014:06:10 06:36:24 radical.pilot.MainProcess: [INFO ] Added credential {'user_id': None, 'type': 'SSH', 'user_pass': None, 'user_key': None} to session 5396fbc63f795e53e7108f2b. 2014:06:10 06:36:24 radical.pilot.MainProcess: [INFO ] Loaded resource configurations from file://localhost//glade/u/home/dinesh/rp13/lib/python2.6/site-packages/radical.pilot-0.14-py2.6.egg/radical/pilot/configs/archer.json 2014:06:10 06:36:24 radical.pilot.MainProcess: [INFO ] Loaded resource configurations from file://localhost//glade/u/home/dinesh/rp13/lib/python2.6/site-packages/radical.pilot-0.14-py2.6.egg/radical/pilot/configs/futuregrid.json 2014:06:10 06:36:24 radical.pilot.MainProcess: [INFO ] Loaded resource configurations from file://localhost//glade/u/home/dinesh/rp13/lib/python2.6/site-packages/radical.pilot-0.14-py2.6.egg/radical/pilot/configs/localhost.json 2014:06:10 06:36:24 radical.pilot.MainProcess: [INFO ] Loaded resource configurations from file://localhost//glade/u/home/dinesh/rp13/lib/python2.6/site-packages/radical.pilot-0.14-py2.6.egg/radical/pilot/configs/ncar.json 2014:06:10 06:36:24 radical.pilot.MainProcess: [INFO ] Loaded resource configurations from file://localhost//glade/u/home/dinesh/rp13/lib/python2.6/site-packages/radical.pilot-0.14-py2.6.egg/radical/pilot/configs/xsede.json 2014:06:10 06:36:25 radical.pilot.MainProcess: [INFO ] Loaded scheduler: DirectSubmissionScheduler. 2014:06:10 06:36:25 radical.pilot.PilotLauncherWorker-1: [INFO ] Launching ComputePilot {u'state': u'PendingLaunch', u'description': {u'project': u'URTG0003', u'resource': u'yellowstone.ucar.edu', u'queue': None, u'sandbox': u'/glade/scratch/dinesh/workdir/', u'cleanup': True, u'pilot_agent_priv': None, u'cores': 16, u'runtime': 15}, u'sagajobid': None, u'started': None, u'cores_per_node': None, u'output_transfer_started': None, u'sandbox': u'sftp:// yellowstone.ucar.edu/glade/scratch/dinesh/workdir/pilot-5396fbc83f795e53e7108f2d/', u'submitted': datetime.datetime(2014, 6, 10, 12, 36, 24, 904000), u'unitmanager': u'5396fbc83f795e53e7108f2e', u'output_transfer_finished': None, u'finished': None, u'pilotmanager': u'5396fbc83f795e53e7108f2c', u'command': None, u'wu_queue': [], u'statehistory': [{u'timestamp': datetime.datetime(2014, 6, 10, 12, 36, 24, 903000), u'state': u'PendingLaunch'}], u'input_transfer_started': None, u'_id': ObjectId('5396fbc83f795e53e7108f2d'), u'input_transfer_finished': None, u'nodes': None, u'log': []} 2014:06:10 06:36:25 radical.pilot.MainProcess: [INFO ] ComputePilot '5396fbc83f795e53e7108f2d' state changed from 'PendingLaunch' to 'Launching'. [Callback]: ComputePilot '5396fbc83f795e53e7108f2d' state changed to Launching. 2014:06:10 06:36:26 radical.pilot.MainProcess: [INFO ] Scheduled ComputeUnits ['5396fbc93f795e53e7108f2f', '5396fbc93f795e53e7108f30', '5396fbc93f795e53e7108f31', '5396fbc93f795e53e7108f32', '5396fbc93f795e53e7108f33', '5396fbc93f795e53e7108f34', '5396fbc93f795e53e7108f35', '5396fbc93f795e53e7108f36', '5396fbc93f795e53e7108f37', '5396fbc93f795e53e7108f38', '5396fbc93f795e53e7108f39', '5396fbc93f795e53e7108f3a', '5396fbc93f795e53e7108f3b', '5396fbc93f795e53e7108f3c', '5396fbc93f795e53e7108f3d', '5396fbc93f795e53e7108f3e'] for execution on ComputePilot '5396fbc83f795e53e7108f2d'. 2014:06:10 06:36:26 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '5396fbc93f795e53e7108f2f' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '5396fbc93f795e53e7108f2f' state changed to PendingExecution. 2014:06:10 06:36:26 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '5396fbc93f795e53e7108f30' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '5396fbc93f795e53e7108f30' state changed to PendingExecution. 2014:06:10 06:36:26 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '5396fbc93f795e53e7108f31' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '5396fbc93f795e53e7108f31' state changed to PendingExecution. 2014:06:10 06:36:26 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '5396fbc93f795e53e7108f32' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '5396fbc93f795e53e7108f32' state changed to PendingExecution. 2014:06:10 06:36:26 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '5396fbc93f795e53e7108f33' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '5396fbc93f795e53e7108f33' state changed to PendingExecution. 2014:06:10 06:36:26 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '5396fbc93f795e53e7108f34' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '5396fbc93f795e53e7108f34' state changed to PendingExecution. 2014:06:10 06:36:26 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '5396fbc93f795e53e7108f35' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '5396fbc93f795e53e7108f35' state changed to PendingExecution. 2014:06:10 06:36:26 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '5396fbc93f795e53e7108f36' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '5396fbc93f795e53e7108f36' state changed to PendingExecution. 2014:06:10 06:36:26 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '5396fbc93f795e53e7108f37' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '5396fbc93f795e53e7108f37' state changed to PendingExecution. 2014:06:10 06:36:26 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '5396fbc93f795e53e7108f38' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '5396fbc93f795e53e7108f38' state changed to PendingExecution. 2014:06:10 06:36:26 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '5396fbc93f795e53e7108f39' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '5396fbc93f795e53e7108f39' state changed to PendingExecution. 2014:06:10 06:36:26 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '5396fbc93f795e53e7108f3a' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '5396fbc93f795e53e7108f3a' state changed to PendingExecution. 2014:06:10 06:36:26 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '5396fbc93f795e53e7108f3b' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '5396fbc93f795e53e7108f3b' state changed to PendingExecution. 2014:06:10 06:36:26 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '5396fbc93f795e53e7108f3c' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '5396fbc93f795e53e7108f3c' state changed to PendingExecution. 2014:06:10 06:36:26 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '5396fbc93f795e53e7108f3d' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '5396fbc93f795e53e7108f3d' state changed to PendingExecution. 2014:06:10 06:36:26 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '5396fbc93f795e53e7108f3e' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '5396fbc93f795e53e7108f3e' state changed to PendingExecution. 2014:06:10 06:36:32 radical.pilot.MainProcess: [INFO ] ComputePilot '5396fbc83f795e53e7108f2d' state changed from 'Launching' to 'PendingActive'. [Callback]: ComputePilot '5396fbc83f795e53e7108f2d' state changed to PendingActive. 2014:06:10 06:37:10 radical.pilot.MainProcess: [INFO ] ComputePilot '5396fbc83f795e53e7108f2d' state changed from 'PendingActive' to 'Active'. [Callback]: ComputePilot '5396fbc83f795e53e7108f2d' state changed to Active. 2014:06:10 06:37:11 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '5396fbc93f795e53e7108f2f' state changed from 'PendingExecution' to 'Executing'. [Callback]: ComputeUnit '5396fbc93f795e53e7108f2f' state changed to Executing. 2014:06:10 06:37:12 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '5396fbc93f795e53e7108f30' state changed from 'PendingExecution' to 'Executing'. [Callback]: ComputeUnit '5396fbc93f795e53e7108f30' state changed to Executing. 2014:06:10 06:37:13 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '5396fbc93f795e53e7108f31' state changed from 'PendingExecution' to 'Executing'. [Callback]: ComputeUnit '5396fbc93f795e53e7108f31' state changed to Executing. 2014:06:10 06:37:14 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '5396fbc93f795e53e7108f32' state changed from 'PendingExecution' to 'Executing'. [Callback]: ComputeUnit '5396fbc93f795e53e7108f32' state changed to Executing. 2014:06:10 06:37:15 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '5396fbc93f795e53e7108f33' state changed from 'PendingExecution' to 'Executing'. [Callback]: ComputeUnit '5396fbc93f795e53e7108f33' state changed to Executing. 2014:06:10 06:37:16 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '5396fbc93f795e53e7108f34' state changed from 'PendingExecution' to 'Executing'. [Callback]: ComputeUnit '5396fbc93f795e53e7108f34' state changed to Executing. 2014:06:10 06:37:17 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '5396fbc93f795e53e7108f35' state changed from 'PendingExecution' to 'Executing'. [Callback]: ComputeUnit '5396fbc93f795e53e7108f35' state changed to Executing. 2014:06:10 06:37:18 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '5396fbc93f795e53e7108f36' state changed from 'PendingExecution' to 'Executing'. [Callback]: ComputeUnit '5396fbc93f795e53e7108f36' state changed to Executing. 2014:06:10 06:37:19 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '5396fbc93f795e53e7108f37' state changed from 'PendingExecution' to 'Executing'. [Callback]: ComputeUnit '5396fbc93f795e53e7108f37' state changed to Executing. 2014:06:10 06:37:20 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '5396fbc93f795e53e7108f38' state changed from 'PendingExecution' to 'Executing'. [Callback]: ComputeUnit '5396fbc93f795e53e7108f38' state changed to Executing. 2014:06:10 06:37:21 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '5396fbc93f795e53e7108f39' state changed from 'PendingExecution' to 'Executing'. [Callback]: ComputeUnit '5396fbc93f795e53e7108f39' state changed to Executing. 2014:06:10 06:37:22 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '5396fbc93f795e53e7108f3a' state changed from 'PendingExecution' to 'Executing'. [Callback]: ComputeUnit '5396fbc93f795e53e7108f3a' state changed to Executing. 2014:06:10 06:37:23 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '5396fbc93f795e53e7108f3b' state changed from 'PendingExecution' to 'Executing'. [Callback]: ComputeUnit '5396fbc93f795e53e7108f3b' state changed to Executing. 2014:06:10 06:37:24 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '5396fbc93f795e53e7108f3c' state changed from 'PendingExecution' to 'Executing'. [Callback]: ComputeUnit '5396fbc93f795e53e7108f3c' state changed to Executing. 2014:06:10 06:37:25 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '5396fbc93f795e53e7108f3d' state changed from 'PendingExecution' to 'Executing'. [Callback]: ComputeUnit '5396fbc93f795e53e7108f3d' state changed to Executing. 2014:06:10 06:37:26 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '5396fbc93f795e53e7108f3e' state changed from 'PendingExecution' to 'Executing'. [Callback]: ComputeUnit '5396fbc93f795e53e7108f3e' state changed to Executing. 2014:06:10 06:41:12 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '5396fbc93f795e53e7108f2f' state changed from 'Executing' to 'Done'. [Callback]: ComputeUnit '5396fbc93f795e53e7108f2f' state changed to Done. 2014:06:10 06:41:12 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '5396fbc93f795e53e7108f30' state changed from 'Executing' to 'Done'. [Callback]: ComputeUnit '5396fbc93f795e53e7108f30' state changed to Done. 2014:06:10 06:41:12 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '5396fbc93f795e53e7108f31' state changed from 'Executing' to 'Done'. [Callback]: ComputeUnit '5396fbc93f795e53e7108f31' state changed to Done. 2014:06:10 06:41:13 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '5396fbc93f795e53e7108f32' state changed from 'Executing' to 'Done'. [Callback]: ComputeUnit '5396fbc93f795e53e7108f32' state changed to Done. 2014:06:10 06:41:13 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '5396fbc93f795e53e7108f34' state changed from 'Executing' to 'Done'. [Callback]: ComputeUnit '5396fbc93f795e53e7108f34' state changed to Done. 2014:06:10 06:41:13 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '5396fbc93f795e53e7108f35' state changed from 'Executing' to 'Done'. [Callback]: ComputeUnit '5396fbc93f795e53e7108f35' state changed to Done. 2014:06:10 06:41:13 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '5396fbc93f795e53e7108f36' state changed from 'Executing' to 'Done'. [Callback]: ComputeUnit '5396fbc93f795e53e7108f36' state changed to Done. 2014:06:10 06:41:13 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '5396fbc93f795e53e7108f38' state changed from 'Executing' to 'Done'. [Callback]: ComputeUnit '5396fbc93f795e53e7108f38' state changed to Done. 2014:06:10 06:41:13 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '5396fbc93f795e53e7108f39' state changed from 'Executing' to 'Done'. [Callback]: ComputeUnit '5396fbc93f795e53e7108f39' state changed to Done. 2014:06:10 06:41:13 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '5396fbc93f795e53e7108f3a' state changed from 'Executing' to 'Done'. [Callback]: ComputeUnit '5396fbc93f795e53e7108f3a' state changed to Done. 2014:06:10 06:41:13 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '5396fbc93f795e53e7108f3b' state changed from 'Executing' to 'Done'. [Callback]: ComputeUnit '5396fbc93f795e53e7108f3b' state changed to Done. 2014:06:10 06:41:13 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '5396fbc93f795e53e7108f3c' state changed from 'Executing' to 'Done'. [Callback]: ComputeUnit '5396fbc93f795e53e7108f3c' state changed to Done. 2014:06:10 06:41:13 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '5396fbc93f795e53e7108f3d' state changed from 'Executing' to 'Done'. [Callback]: ComputeUnit '5396fbc93f795e53e7108f3d' state changed to Done. 2014:06:10 06:41:15 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '5396fbc93f795e53e7108f33' state changed from 'Executing' to 'Done'. [Callback]: ComputeUnit '5396fbc93f795e53e7108f33' state changed to Done. 2014:06:10 06:41:15 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '5396fbc93f795e53e7108f37' state changed from 'Executing' to 'Done'. [Callback]: ComputeUnit '5396fbc93f795e53e7108f37' state changed to Done. 2014:06:10 06:41:15 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '5396fbc93f795e53e7108f3e' state changed from 'Executing' to 'Done'. [Callback]: ComputeUnit '5396fbc93f795e53e7108f3e' state changed to Done.

The AGENT.STDERR s reads,

% Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 1957k 100 1957k 0 0 2145k 0 --:--:-- --:--:-- --:--:-- 2252k warning: no files found matching '.html' under directory 'docs' warning: no previously-included files matching '.txt' found under directory 'docs/_build' no previously-included directories found matching 'docs/_build/_sources' zip_safe flag not set; analyzing archive contents... unable to execute gcc: No such file or directory /tmp/easy_install-jRJcXA/pymongo-2.7.1/setup.py:217: UserWarning:


WARNING: The bson._cbson extension module could not be compiled. No C extensions are essential for PyMongo to run, although they do result in significant speed improvements. The output above this warning shows how the compilation failed.

Please see the installation docs for solutions to build issues:

http://api.mongodb.org/python/current/installation.html

Here are some hints for popular operating systems:

If you are seeing this message on Linux you probably need to install GCC and/or the Python development package for your version of Python.

Debian and Ubuntu users should issue the following command:

$ sudo apt-get install build-essential python-dev

Users of Red Hat based distributions (RHEL, CentOS, Amazon Linux, Oracle Linux, Fedora, etc.) should issue the following command:

$ sudo yum install gcc python-devel

If you are seeing this message on Microsoft Windows please install PyMongo using the MS Windows installer for your version of Python, available on pypi here:

http://pypi.python.org/pypi/pymongo/#downloads

If you are seeing this message on OSX please read the documentation here:

http://api.mongodb.org/python/current/installation.html#osx


"The output above " unable to execute gcc: No such file or directory /tmp/easy_install-jRJcXA/pymongo-2.7.1/setup.py:217: UserWarning:


WARNING: The pymongo._cmessage extension module could not be compiled. No C extensions are essential for PyMongo to run, although they do result in significant speed improvements. The output above this warning shows how the compilation failed.

Please see the installation docs for solutions to build issues:

http://api.mongodb.org/python/current/installation.html

Here are some hints for popular operating systems:

If you are seeing this message on Linux you probably need to install GCC and/or the Python development package for your version of Python.

Debian and Ubuntu users should issue the following command:

$ sudo apt-get install build-essential python-dev

Users of Red Hat based distributions (RHEL, CentOS, Amazon Linux, Oracle Linux, Fedora, etc.) should issue the following command:

$ sudo yum install gcc python-devel

If you are seeing this message on Microsoft Windows please install PyMongo using the MS Windows installer for your version of Python, available on pypi here:

http://pypi.python.org/pypi/pymongo/#downloads

If you are seeing this message on OSX please read the documentation here:

http://api.mongodb.org/python/current/installation.html#osx


"The output above " zip_safe flag not set; analyzing archive contents...

On Tue, Jun 10, 2014 at 7:21 AM, Mark Santcroos notifications@github.com wrote:

On 09 Jun 2014, at 16:28 , dinesh-ganapati notifications@github.com wrote:

I am only running a bash command. Within the the bash script I am doing a few installs and running a python script.

To get back to this question, I meant, what (type of) applications are you planning to run on Yellowstone for the project?

— Reply to this email directly or view it on GitHub https://github.com/radical-cybertools/radical.pilot/issues/145#issuecomment-45601318 .

marksantcroos commented 10 years ago

On 10 Jun 2014, at 14:46 , dinesh-ganapati notifications@github.com wrote:

I am trying to run an ensemble of trajectory analysis tasks using MDAnalysis python library and computing rmsd of each frame with respect to the first frame for an mdcrd file.

Since when do climate scientists run md tools?! Oh well :-)

I ran your canonical example from tests/yellowstone and it worked.

Good.

Also this time when I ran my own script I did see the job being submitted to the lsf, and also the unit folders are created within the pilot folder in the workdir

Ok, good, from the perspective of RP all seems fine now. All tasks end with DONE.

(even though the execution stage seems to happen for quite a bit of time I see that the rms output file has nothing in it and eventually the unit folders are all deleted (i guess that's due to cancelling of the pilots)).

Hmm, there is likely something wrong the the CompUnitDesc or with the RP implementation. Can you paste the CUD here? (I didn’t get your code as you can’t send attachments throught github tickets)

dinesh-ganapati commented 10 years ago

Hi Mark, Well I agree with you, YS is indeed quite specific to Climate related stuff but I am just trying to see how the IO subsystem gets saturated with increasing MD tasks. :-) . here is the code :

import os import sys import time import radical.pilot

READ: The RADICAL-Pilot documentation:

http://radicalpilot.readthedocs.org/en/latest

#

Try running this example with RADICAL_PILOT_VERBOSE=debug set if

you want to see what happens behind the scenes!

# WORKDIR = "/glade/scratch/dinesh/workdir/" NUMBER_JOBS =16

RADICAL-Pilot uses ssh to communicate with the remote resource. The

easiest way to make this work seamlessly is to set up ssh key-based

authentication and add the key to your keychain so you won't be

prompted for a password. The following article explains how to set

this up on Linux:

# http://www.cyberciti.biz/faq/ssh-password-less-login-with-dsa-publickey-authentication/

DBURL defines the MongoDB server URL and has the format

mongodb://host:port.

For the installation of a MongoDB server, refer to http://docs.mongodb.org

. DBURL = 'mongodb:// ec2-184-72-89-141.compute-1.amazonaws.com:27017/'#os.getenv( "RADICAL_PILOT_DBURL") if DBURL is None: print "ERROR: RADICAL_PILOT_DBURL (MongoDB server URL) is not defined." sys.exit(1)

------------------------------------------------------------------------------

# def pilot_state_cb(pilot, state): """pilot_state_change_cb() is a callback function. It gets called very time a ComputePilot changes its state. """ print "[Callback]: ComputePilot '{0}' state changed to {1}.".format( pilot.uid, state)

if state == radical.pilot.states.FAILED:
    sys.exit(1)

------------------------------------------------------------------------------

# def unit_state_change_cb(unit, state): """unit_state_change_cb() is a callback function. It gets called very time a ComputeUnit changes its state. """ print "[Callback]: ComputeUnit '{0}' state changed to {1}.".format( unit.uid, state) if state == radical.pilot.states.FAILED: print " Log: %s" % unit.log[-1]

------------------------------------------------------------------------------

# if name == "main":

try:
    start_1=time.time()
    # Create a new session. A session is the 'root' object for all other
    # RADICAL-Pilot objects. It encapsulates the MongoDB connection(s)

as

well as security credentials.

    session = radical.pilot.Session(database_url=DBURL)
    # Add an ssh identity to the session.
    cred = radical.pilot.SSHCredential()
    #cred.user_id = "tg802352"
    session.add_credential(cred)

    # Add a Pilot Manager. Pilot managers manage one or more

ComputePilots. pmgr = radical.pilot.PilotManager(session=session)

    # Register our callback with the PilotManager. This callback will

get

called every time any of the pilots managed by the PilotManager

    # change their state.
    pmgr.register_callback(pilot_state_cb)
    # Define a 32-core on stamped that runs for 15 minutes and
    # uses $HOME/radical.pilot.sandbox as sandbox directory.
    pdesc = radical.pilot.ComputePilotDescription()
    #pdesc.resource  = "stampede.tacc.utexas.edu"
    pdesc.resource  = "yellowstone.ucar.edu"
    pdesc.runtime   = 15 # minutes
    pdesc.cores     = 16
    pdesc.cleanup   = True
    pdesc.project   = "URTG0003"
    pdesc.sandbox   = WORKDIR
    #pdesc.pilot_agent_priv = "radical-pilot-agent-multicore.py"

    # Launch the pilot.
    pilot = pmgr.submit_pilots(pdesc)

    compute_units = []
    start_2=time.time()
    f=open('rad_pilot_timer_ys_'+str(NUMBER_JOBS)+'.txt','w')
    i=0
    for unit_count in range(0, NUMBER_JOBS):
        cu = radical.pilot.ComputeUnitDescription()
        cu.executable  = "bash"
        cu.arguments   =

['/glade/scratch/dinesh/data/try_rmsd_callingscript'+str(i)+'.sh'] cu.cores = 1 compute_units.append(cu) i=i+1

    # Combine the ComputePilot, the ComputeUnits and a scheduler via
    # a UnitManager object.
    umgr = radical.pilot.UnitManager(
        session=session,
        scheduler=radical.pilot.SCHED_DIRECT_SUBMISSION)

    # Register our callback with the UnitManager. This callback will get
    # called every time any of the units managed by the UnitManager
    # change their state.
    umgr.register_callback(unit_state_change_cb)

    # Add the previously created ComputePilot to the UnitManager.
    umgr.add_pilots(pilot)

    # Submit the previously created ComputeUnit descriptions to the
    # PilotManager. This will trigger the selected scheduler to start
    # assigning ComputeUnits to the ComputePilots.
    units = umgr.submit_units(compute_units)

    # Wait for all compute units to reach a terminal state (DONE or

FAILED). umgr.wait_units()

    if not isinstance(units, list):
        units = [units]
    for unit in units:
        print "* Task %s (executed @ %s) state: %s, exit code: %s,

started: %s, finished: %s, output: %s" \ % (unit.uid, unit.execution_locations, unit.state, unit.exit_code, unit.start_time, unit.stop_time, unit.stdout)

    # Close automatically cancels the pilot(s).
    session.close()
    done = time.time()
    elapsed = done - start_1
    f.write(str(elapsed));
    f.write('\n')
    f.write(str(start_2-start_1))
  #  f.write('\n')
 #   f.write(str(start-start_2))
    f.close();
    sys.exit(0)

except radical.pilot.PilotException, ex:
    # Catch all exceptions and exit with and error.
    print "Error during execution: %s" % ex
    sys.exit(1)

On Tue, Jun 10, 2014 at 9:11 AM, Mark Santcroos notifications@github.com wrote:

On 10 Jun 2014, at 14:46 , dinesh-ganapati notifications@github.com wrote:

I am trying to run an ensemble of trajectory analysis tasks using MDAnalysis python library and computing rmsd of each frame with respect to the first frame for an mdcrd file.

Since when do climate scientists run md tools?! Oh well :-)

I ran your canonical example from tests/yellowstone and it worked.

Good.

Also this time when I ran my own script I did see the job being submitted to the lsf, and also the unit folders are created within the pilot folder in the workdir

Ok, good, from the perspective of RP all seems fine now. All tasks end with DONE.

(even though the execution stage seems to happen for quite a bit of time I see that the rms output file has nothing in it and eventually the unit folders are all deleted (i guess that's due to cancelling of the pilots)).

Hmm, there is likely something wrong the the CompUnitDesc or with the RP implementation. Can you paste the CUD here? (I didn’t get your code as you can’t send attachments throught github tickets)

— Reply to this email directly or view it on GitHub https://github.com/radical-cybertools/radical.pilot/issues/145#issuecomment-45610906 .

marksantcroos commented 10 years ago

(even though the execution stage seems to happen for quite a bit of time I see that the rms output file has nothing in it and eventually the unit folders are all deleted (i guess that's due to cancelling of the pilots)).

How do you conclude that the rms output file is empty after the run? I dont see in your code that you retrieve it once the task is done. Like you said, it gets cleaned up after finish.

If you want to keep the unit directories, add "delete=False” to session.close().

dinesh-ganapati commented 10 years ago

Hi Mark, Thanks a lot! that seems to have been the problem. I can see the expected output now. Also one quick question, I ran 16 tasks (1 task per core), just by crude estimation I sense that it takes more than BigJob. (I ran similar analysis for BigJobs weeks ago). Would that possibly be because of the ssh tunnel or other hacks to make it radical pilot work on ys? Also the STDERR file still has the same things I posted in the previous post. (Everything is working without a problem though)

Thanks and regards, Dinesh

On Tue, Jun 10, 2014 at 9:47 AM, Mark Santcroos notifications@github.com wrote:

(even though the execution stage seems to happen for quite a bit of time I see that the rms output file has nothing in it and eventually the unit folders are all deleted (i guess that's due to cancelling of the pilots)).

How do you conclude that the rms output file is empty after the run? I dont see in your code that you retrieve it once the task is done. Like you said, it gets cleaned up after finish.

If you want to keep the unit directories, add "delete=False” to session.close().

— Reply to this email directly or view it on GitHub https://github.com/radical-cybertools/radical.pilot/issues/145#issuecomment-45615086 .

marksantcroos commented 10 years ago

On 10 Jun 2014, at 16:27 , dinesh-ganapati notifications@github.com wrote:

Thanks a lot! that seems to have been the problem. I can see the expected output now. Also one quick question, I ran 16 tasks (1 task per core), just by crude estimation I sense that it takes more than BigJob. (I ran similar analysis for BigJobs weeks ago). Would that possibly be because of the ssh tunnel or other hacks to make it radical pilot work on ys?

That could well be, currently it tries pip for every python package installation and once that times out it tries easy_install. So the overhead of the installation is significant if you just run a couple of tasks. (For longer running pilots its less of a problem of course) This needs a better solution eventually.

Also the STDERR file still has the same things I posted in the previous post. (Everything is working without a problem though)

I noticed that, will look into that too.

Thanks for testing!

andre-merzky commented 10 years ago

So the overhead of the installation is significant if you just run a couple of tasks. (For longer running pilots its less of a problem of course) This needs a better solution eventually.

see #68

dinesh-ganapati commented 10 years ago

Hi Mark, Andre! Thanks for that pointer, Looks interesting. Also i find things are running smooth for 16 tasks running on 16 cores. but for 128 tasks running on 128 cores I get a failure. The only thing I change in my script is that I add a pdesc.queue="premium" and set the number jobs to 128 and teh pdesc.cores =128. I hope I haven't overlooked anything this time.

RADICAL_PILOT_VERBOSE yeilds :

2014:06:10 10:30:54 radical.pilot.MainProcess: [INFO ] radical.pilot version: 0.14 2014:06:10 10:30:54 radical.pilot.MainProcess: [INFO ] New Session created{'database_url': 'mongodb:// ec2-184-72-89-141.compute-1.amazonaws.com:27017/', 'database_name': 'radicalpilot', 'last_reconnect': None, 'uid': '539732be260bdc0ba2cf1974', 'created': datetime.datetime(2014, 6, 10, 16, 30, 54, 154464)}. 2014:06:10 10:30:55 radical.pilot.MainProcess: [INFO ] Added credential {'user_id': None, 'type': 'SSH', 'user_pass': None, 'user_key': None} to session 539732be260bdc0ba2cf1974. 2014:06:10 10:30:55 radical.pilot.MainProcess: [INFO ] Loaded resource configurations from file://localhost//glade/u/home/dinesh/rp13/lib/python2.6/site-packages/radical.pilot-0.14-py2.6.egg/radical/pilot/configs/archer.json 2014:06:10 10:30:55 radical.pilot.MainProcess: [INFO ] Loaded resource configurations from file://localhost//glade/u/home/dinesh/rp13/lib/python2.6/site-packages/radical.pilot-0.14-py2.6.egg/radical/pilot/configs/futuregrid.json 2014:06:10 10:30:55 radical.pilot.MainProcess: [INFO ] Loaded resource configurations from file://localhost//glade/u/home/dinesh/rp13/lib/python2.6/site-packages/radical.pilot-0.14-py2.6.egg/radical/pilot/configs/localhost.json 2014:06:10 10:30:55 radical.pilot.MainProcess: [INFO ] Loaded resource configurations from file://localhost//glade/u/home/dinesh/rp13/lib/python2.6/site-packages/radical.pilot-0.14-py2.6.egg/radical/pilot/configs/ncar.json 2014:06:10 10:30:55 radical.pilot.MainProcess: [INFO ] Loaded resource configurations from file://localhost//glade/u/home/dinesh/rp13/lib/python2.6/site-packages/radical.pilot-0.14-py2.6.egg/radical/pilot/configs/xsede.json 2014:06:10 10:30:56 radical.pilot.MainProcess: [INFO ] Loaded scheduler: DirectSubmissionScheduler. 2014:06:10 10:30:56 radical.pilot.PilotLauncherWorker-1: [INFO ] Launching ComputePilot {u'state': u'PendingLaunch', u'description': {u'project': u'URTG0003', u'resource': u'yellowstone.ucar.edu', u'queue': u'premium', u'sandbox': u'/glade/scratch/dinesh/workdir/', u'cleanup': True, u'pilot_agent_priv': None, u'cores': 128, u'runtime': 140}, u'sagajobid': None, u'started': None, u'cores_per_node': None, u'output_transfer_started': None, u'finished': None, u'submitted': datetime.datetime(2014, 6, 10, 16, 30, 56, 336000), u'output_transfer_finished': None, u'sandbox': u'sftp:// yellowstone.ucar.edu/glade/scratch/dinesh/workdir/pilot-539732c0260bdc0ba2cf1976/', u'pilotmanager': u'539732bf260bdc0ba2cf1975', u'unitmanager': None, u'statehistory': [{u'timestamp': datetime.datetime(2014, 6, 10, 16, 30, 56, 335000), u'state': u'PendingLaunch'}], u'wu_queue': [], u'command': None, u'input_transfer_started': None, u'_id': ObjectId('539732c0260bdc0ba2cf1976'), u'input_transfer_finished': None, u'nodes': None, u'log': []} 2014:06:10 10:30:57 radical.pilot.MainProcess: [INFO ] ComputePilot '539732c0260bdc0ba2cf1976' state changed from 'PendingLaunch' to 'Launching'. [Callback]: ComputePilot '539732c0260bdc0ba2cf1976' state changed to Launching. 2014:06:10 10:30:57 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '539732c0260bdc0ba2cf1978' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '539732c0260bdc0ba2cf1978' state changed to PendingExecution. 2014:06:10 10:30:57 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '539732c0260bdc0ba2cf1979' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '539732c0260bdc0ba2cf1979' state changed to PendingExecution. 2014:06:10 10:30:57 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '539732c0260bdc0ba2cf197a' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '539732c0260bdc0ba2cf197a' state changed to PendingExecution. 2014:06:10 10:30:57 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '539732c0260bdc0ba2cf197b' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '539732c0260bdc0ba2cf197b' state changed to PendingExecution. 2014:06:10 10:30:57 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '539732c0260bdc0ba2cf197c' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '539732c0260bdc0ba2cf197c' state changed to PendingExecution. 2014:06:10 10:30:57 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '539732c0260bdc0ba2cf197d' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '539732c0260bdc0ba2cf197d' state changed to PendingExecution. 2014:06:10 10:30:57 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '539732c0260bdc0ba2cf197e' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '539732c0260bdc0ba2cf197e' state changed to PendingExecution. 2014:06:10 10:30:57 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '539732c0260bdc0ba2cf197f' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '539732c0260bdc0ba2cf197f' state changed to PendingExecution. 2014:06:10 10:30:57 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '539732c0260bdc0ba2cf1980' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '539732c0260bdc0ba2cf1980' state changed to PendingExecution. 2014:06:10 10:30:57 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '539732c0260bdc0ba2cf1981' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '539732c0260bdc0ba2cf1981' state changed to PendingExecution. 2014:06:10 10:30:57 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '539732c0260bdc0ba2cf1982' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '539732c0260bdc0ba2cf1982' state changed to PendingExecution. 2014:06:10 10:30:57 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '539732c0260bdc0ba2cf1983' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '539732c0260bdc0ba2cf1983' state changed to PendingExecution. 2014:06:10 10:30:57 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '539732c0260bdc0ba2cf1984' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '539732c0260bdc0ba2cf1984' state changed to PendingExecution. 2014:06:10 10:30:57 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '539732c0260bdc0ba2cf1985' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '539732c0260bdc0ba2cf1985' state changed to PendingExecution. 2014:06:10 10:30:59 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '539732c0260bdc0ba2cf1986' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '539732c0260bdc0ba2cf1986' state changed to PendingExecution. 2014:06:10 10:30:59 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '539732c0260bdc0ba2cf1987' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '539732c0260bdc0ba2cf1987' state changed to PendingExecution. 2014:06:10 10:30:59 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '539732c0260bdc0ba2cf1988' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '539732c0260bdc0ba2cf1988' state changed to PendingExecution. 2014:06:10 10:30:59 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '539732c0260bdc0ba2cf1989' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '539732c0260bdc0ba2cf1989' state changed to PendingExecution. 2014:06:10 10:30:59 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '539732c0260bdc0ba2cf198a' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '539732c0260bdc0ba2cf198a' state changed to PendingExecution. 2014:06:10 10:30:59 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '539732c0260bdc0ba2cf198b' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '539732c0260bdc0ba2cf198b' state changed to PendingExecution. 2014:06:10 10:30:59 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '539732c0260bdc0ba2cf198c' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '539732c0260bdc0ba2cf198c' state changed to PendingExecution. 2014:06:10 10:30:59 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '539732c0260bdc0ba2cf198d' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '539732c0260bdc0ba2cf198d' state changed to PendingExecution. 2014:06:10 10:30:59 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '539732c0260bdc0ba2cf198e' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '539732c0260bdc0ba2cf198e' state changed to PendingExecution. 2014:06:10 10:30:59 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '539732c0260bdc0ba2cf198f' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '539732c0260bdc0ba2cf198f' state changed to PendingExecution. 2014:06:10 10:30:59 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '539732c0260bdc0ba2cf1990' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '539732c0260bdc0ba2cf1990' state changed to PendingExecution. 2014:06:10 10:30:59 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '539732c0260bdc0ba2cf1991' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '539732c0260bdc0ba2cf1991' state changed to PendingExecution. 2014:06:10 10:30:59 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '539732c0260bdc0ba2cf1992' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '539732c0260bdc0ba2cf1992' state changed to PendingExecution. 2014:06:10 10:30:59 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '539732c0260bdc0ba2cf1993' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '539732c0260bdc0ba2cf1993' state changed to PendingExecution. 2014:06:10 10:30:59 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '539732c0260bdc0ba2cf1994' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '539732c0260bdc0ba2cf1994' state changed to PendingExecution. 2014:06:10 10:30:59 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '539732c0260bdc0ba2cf1995' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '539732c0260bdc0ba2cf1995' state changed to PendingExecution. 2014:06:10 10:30:59 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '539732c0260bdc0ba2cf1996' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '539732c0260bdc0ba2cf1996' state changed to PendingExecution. 2014:06:10 10:30:59 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '539732c0260bdc0ba2cf1997' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '539732c0260bdc0ba2cf1997' state changed to PendingExecution. 2014:06:10 10:30:59 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '539732c0260bdc0ba2cf1998' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '539732c0260bdc0ba2cf1998' state changed to PendingExecution. 2014:06:10 10:30:59 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '539732c0260bdc0ba2cf1999' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '539732c0260bdc0ba2cf1999' state changed to PendingExecution. 2014:06:10 10:30:59 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '539732c0260bdc0ba2cf199a' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '539732c0260bdc0ba2cf199a' state changed to PendingExecution. 2014:06:10 10:30:59 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '539732c0260bdc0ba2cf199b' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '539732c0260bdc0ba2cf199b' state changed to PendingExecution. 2014:06:10 10:30:59 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '539732c0260bdc0ba2cf199c' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '539732c0260bdc0ba2cf199c' state changed to PendingExecution. 2014:06:10 10:30:59 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '539732c0260bdc0ba2cf199d' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '539732c0260bdc0ba2cf199d' state changed to PendingExecution. 2014:06:10 10:30:59 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '539732c0260bdc0ba2cf199e' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '539732c0260bdc0ba2cf199e' state changed to PendingExecution. 2014:06:10 10:30:59 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '539732c0260bdc0ba2cf199f' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '539732c0260bdc0ba2cf199f' state changed to PendingExecution. 2014:06:10 10:30:59 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '539732c0260bdc0ba2cf19a0' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '539732c0260bdc0ba2cf19a0' state changed to PendingExecution. 2014:06:10 10:31:00 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '539732c0260bdc0ba2cf19a1' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '539732c0260bdc0ba2cf19a1' state changed to PendingExecution. 2014:06:10 10:31:00 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '539732c0260bdc0ba2cf19a2' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '539732c0260bdc0ba2cf19a2' state changed to PendingExecution. 2014:06:10 10:31:00 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '539732c0260bdc0ba2cf19a3' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '539732c0260bdc0ba2cf19a3' state changed to PendingExecution. 2014:06:10 10:31:00 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '539732c0260bdc0ba2cf19a4' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '539732c0260bdc0ba2cf19a4' state changed to PendingExecution. 2014:06:10 10:31:00 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '539732c0260bdc0ba2cf19a5' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '539732c0260bdc0ba2cf19a5' state changed to PendingExecution. 2014:06:10 10:31:00 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '539732c0260bdc0ba2cf19a6' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '539732c0260bdc0ba2cf19a6' state changed to PendingExecution. 2014:06:10 10:31:00 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '539732c0260bdc0ba2cf19a7' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '539732c0260bdc0ba2cf19a7' state changed to PendingExecution. 2014:06:10 10:31:00 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '539732c0260bdc0ba2cf19a8' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '539732c0260bdc0ba2cf19a8' state changed to PendingExecution. 2014:06:10 10:31:00 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '539732c0260bdc0ba2cf19a9' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '539732c0260bdc0ba2cf19a9' state changed to PendingExecution. 2014:06:10 10:31:00 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '539732c0260bdc0ba2cf19aa' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '539732c0260bdc0ba2cf19aa' state changed to PendingExecution. 2014:06:10 10:31:00 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '539732c0260bdc0ba2cf19ab' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '539732c0260bdc0ba2cf19ab' state changed to PendingExecution. 2014:06:10 10:31:00 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '539732c0260bdc0ba2cf19ac' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '539732c0260bdc0ba2cf19ac' state changed to PendingExecution. 2014:06:10 10:31:00 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '539732c0260bdc0ba2cf19ad' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '539732c0260bdc0ba2cf19ad' state changed to PendingExecution. 2014:06:10 10:31:00 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '539732c0260bdc0ba2cf19ae' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '539732c0260bdc0ba2cf19ae' state changed to PendingExecution. 2014:06:10 10:31:00 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '539732c0260bdc0ba2cf19af' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '539732c0260bdc0ba2cf19af' state changed to PendingExecution. 2014:06:10 10:31:00 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '539732c0260bdc0ba2cf19b0' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '539732c0260bdc0ba2cf19b0' state changed to PendingExecution. 2014:06:10 10:31:00 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '539732c0260bdc0ba2cf19b1' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '539732c0260bdc0ba2cf19b1' state changed to PendingExecution. 2014:06:10 10:31:00 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '539732c0260bdc0ba2cf19b2' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '539732c0260bdc0ba2cf19b2' state changed to PendingExecution. 2014:06:10 10:31:00 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '539732c0260bdc0ba2cf19b3' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '539732c0260bdc0ba2cf19b3' state changed to PendingExecution. 2014:06:10 10:31:00 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '539732c0260bdc0ba2cf19b4' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '539732c0260bdc0ba2cf19b4' state changed to PendingExecution. 2014:06:10 10:31:00 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '539732c0260bdc0ba2cf19b5' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '539732c0260bdc0ba2cf19b5' state changed to PendingExecution. 2014:06:10 10:31:00 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '539732c0260bdc0ba2cf19b6' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '539732c0260bdc0ba2cf19b6' state changed to PendingExecution. 2014:06:10 10:31:00 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '539732c0260bdc0ba2cf19b7' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '539732c0260bdc0ba2cf19b7' state changed to PendingExecution. 2014:06:10 10:31:00 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '539732c0260bdc0ba2cf19b8' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '539732c0260bdc0ba2cf19b8' state changed to PendingExecution. 2014:06:10 10:31:00 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '539732c0260bdc0ba2cf19b9' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '539732c0260bdc0ba2cf19b9' state changed to PendingExecution. 2014:06:10 10:31:01 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '539732c0260bdc0ba2cf19ba' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '539732c0260bdc0ba2cf19ba' state changed to PendingExecution. 2014:06:10 10:31:01 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '539732c0260bdc0ba2cf19bb' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '539732c0260bdc0ba2cf19bb' state changed to PendingExecution. 2014:06:10 10:31:01 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '539732c0260bdc0ba2cf19bc' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '539732c0260bdc0ba2cf19bc' state changed to PendingExecution. 2014:06:10 10:31:01 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '539732c0260bdc0ba2cf19bd' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '539732c0260bdc0ba2cf19bd' state changed to PendingExecution. 2014:06:10 10:31:01 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '539732c0260bdc0ba2cf19be' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '539732c0260bdc0ba2cf19be' state changed to PendingExecution. 2014:06:10 10:31:01 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '539732c0260bdc0ba2cf19bf' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '539732c0260bdc0ba2cf19bf' state changed to PendingExecution. 2014:06:10 10:31:01 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '539732c0260bdc0ba2cf19c0' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '539732c0260bdc0ba2cf19c0' state changed to PendingExecution. 2014:06:10 10:31:01 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '539732c0260bdc0ba2cf19c1' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '539732c0260bdc0ba2cf19c1' state changed to PendingExecution. 2014:06:10 10:31:01 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '539732c0260bdc0ba2cf19c2' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '539732c0260bdc0ba2cf19c2' state changed to PendingExecution. 2014:06:10 10:31:01 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '539732c0260bdc0ba2cf19c3' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '539732c0260bdc0ba2cf19c3' state changed to PendingExecution. 2014:06:10 10:31:01 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '539732c0260bdc0ba2cf19c4' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '539732c0260bdc0ba2cf19c4' state changed to PendingExecution. 2014:06:10 10:31:01 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '539732c0260bdc0ba2cf19c5' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '539732c0260bdc0ba2cf19c5' state changed to PendingExecution. 2014:06:10 10:31:01 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '539732c0260bdc0ba2cf19c6' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '539732c0260bdc0ba2cf19c6' state changed to PendingExecution. 2014:06:10 10:31:01 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '539732c0260bdc0ba2cf19c7' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '539732c0260bdc0ba2cf19c7' state changed to PendingExecution. 2014:06:10 10:31:01 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '539732c0260bdc0ba2cf19c8' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '539732c0260bdc0ba2cf19c8' state changed to PendingExecution. 2014:06:10 10:31:01 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '539732c0260bdc0ba2cf19c9' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '539732c0260bdc0ba2cf19c9' state changed to PendingExecution. 2014:06:10 10:31:01 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '539732c0260bdc0ba2cf19ca' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '539732c0260bdc0ba2cf19ca' state changed to PendingExecution. 2014:06:10 10:31:01 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '539732c0260bdc0ba2cf19cb' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '539732c0260bdc0ba2cf19cb' state changed to PendingExecution. 2014:06:10 10:31:01 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '539732c0260bdc0ba2cf19cc' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '539732c0260bdc0ba2cf19cc' state changed to PendingExecution. 2014:06:10 10:31:01 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '539732c0260bdc0ba2cf19cd' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '539732c0260bdc0ba2cf19cd' state changed to PendingExecution. 2014:06:10 10:31:01 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '539732c0260bdc0ba2cf19ce' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '539732c0260bdc0ba2cf19ce' state changed to PendingExecution. 2014:06:10 10:31:01 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '539732c0260bdc0ba2cf19cf' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '539732c0260bdc0ba2cf19cf' state changed to PendingExecution. 2014:06:10 10:31:01 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '539732c0260bdc0ba2cf19d0' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '539732c0260bdc0ba2cf19d0' state changed to PendingExecution. 2014:06:10 10:31:01 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '539732c0260bdc0ba2cf19d1' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '539732c0260bdc0ba2cf19d1' state changed to PendingExecution. 2014:06:10 10:31:01 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '539732c0260bdc0ba2cf19d2' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '539732c0260bdc0ba2cf19d2' state changed to PendingExecution. 2014:06:10 10:31:01 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '539732c0260bdc0ba2cf19d3' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '539732c0260bdc0ba2cf19d3' state changed to PendingExecution. 2014:06:10 10:31:01 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '539732c0260bdc0ba2cf19d4' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '539732c0260bdc0ba2cf19d4' state changed to PendingExecution. 2014:06:10 10:31:01 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '539732c0260bdc0ba2cf19d5' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '539732c0260bdc0ba2cf19d5' state changed to PendingExecution. 2014:06:10 10:31:02 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '539732c0260bdc0ba2cf19d6' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '539732c0260bdc0ba2cf19d6' state changed to PendingExecution. 2014:06:10 10:31:02 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '539732c0260bdc0ba2cf19d7' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '539732c0260bdc0ba2cf19d7' state changed to PendingExecution. 2014:06:10 10:31:02 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '539732c0260bdc0ba2cf19d8' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '539732c0260bdc0ba2cf19d8' state changed to PendingExecution. 2014:06:10 10:31:02 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '539732c0260bdc0ba2cf19d9' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '539732c0260bdc0ba2cf19d9' state changed to PendingExecution. 2014:06:10 10:31:02 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '539732c0260bdc0ba2cf19da' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '539732c0260bdc0ba2cf19da' state changed to PendingExecution. 2014:06:10 10:31:02 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '539732c0260bdc0ba2cf19db' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '539732c0260bdc0ba2cf19db' state changed to PendingExecution. 2014:06:10 10:31:02 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '539732c0260bdc0ba2cf19dc' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '539732c0260bdc0ba2cf19dc' state changed to PendingExecution. 2014:06:10 10:31:02 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '539732c0260bdc0ba2cf19dd' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '539732c0260bdc0ba2cf19dd' state changed to PendingExecution. 2014:06:10 10:31:02 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '539732c0260bdc0ba2cf19de' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '539732c0260bdc0ba2cf19de' state changed to PendingExecution. 2014:06:10 10:31:02 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '539732c0260bdc0ba2cf19df' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '539732c0260bdc0ba2cf19df' state changed to PendingExecution. 2014:06:10 10:31:02 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '539732c0260bdc0ba2cf19e0' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '539732c0260bdc0ba2cf19e0' state changed to PendingExecution. 2014:06:10 10:31:02 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '539732c0260bdc0ba2cf19e1' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '539732c0260bdc0ba2cf19e1' state changed to PendingExecution. 2014:06:10 10:31:02 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '539732c0260bdc0ba2cf19e2' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '539732c0260bdc0ba2cf19e2' state changed to PendingExecution. 2014:06:10 10:31:02 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '539732c0260bdc0ba2cf19e3' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '539732c0260bdc0ba2cf19e3' state changed to PendingExecution. 2014:06:10 10:31:02 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '539732c0260bdc0ba2cf19e4' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '539732c0260bdc0ba2cf19e4' state changed to PendingExecution. 2014:06:10 10:31:02 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '539732c0260bdc0ba2cf19e5' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '539732c0260bdc0ba2cf19e5' state changed to PendingExecution. 2014:06:10 10:31:02 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '539732c0260bdc0ba2cf19e6' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '539732c0260bdc0ba2cf19e6' state changed to PendingExecution. 2014:06:10 10:31:02 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '539732c0260bdc0ba2cf19e7' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '539732c0260bdc0ba2cf19e7' state changed to PendingExecution. 2014:06:10 10:31:02 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '539732c0260bdc0ba2cf19e8' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '539732c0260bdc0ba2cf19e8' state changed to PendingExecution. 2014:06:10 10:31:02 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '539732c0260bdc0ba2cf19e9' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '539732c0260bdc0ba2cf19e9' state changed to PendingExecution. 2014:06:10 10:31:02 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '539732c0260bdc0ba2cf19ea' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '539732c0260bdc0ba2cf19ea' state changed to PendingExecution. 2014:06:10 10:31:02 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '539732c0260bdc0ba2cf19eb' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '539732c0260bdc0ba2cf19eb' state changed to PendingExecution. 2014:06:10 10:31:02 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '539732c0260bdc0ba2cf19ec' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '539732c0260bdc0ba2cf19ec' state changed to PendingExecution. 2014:06:10 10:31:02 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '539732c0260bdc0ba2cf19ed' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '539732c0260bdc0ba2cf19ed' state changed to PendingExecution. 2014:06:10 10:31:02 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '539732c0260bdc0ba2cf19ee' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '539732c0260bdc0ba2cf19ee' state changed to PendingExecution. 2014:06:10 10:31:02 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '539732c0260bdc0ba2cf19ef' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '539732c0260bdc0ba2cf19ef' state changed to PendingExecution. 2014:06:10 10:31:02 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '539732c0260bdc0ba2cf19f0' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '539732c0260bdc0ba2cf19f0' state changed to PendingExecution. 2014:06:10 10:31:02 radical.pilot.MainProcess: [INFO ] Scheduled ComputeUnits ['539732c0260bdc0ba2cf1978', '539732c0260bdc0ba2cf1979', '539732c0260bdc0ba2cf197a', '539732c0260bdc0ba2cf197b', '539732c0260bdc0ba2cf197c', '539732c0260bdc0ba2cf197d', '539732c0260bdc0ba2cf197e', '539732c0260bdc0ba2cf197f', '539732c0260bdc0ba2cf1980', '539732c0260bdc0ba2cf1981', '539732c0260bdc0ba2cf1982', '539732c0260bdc0ba2cf1983', '539732c0260bdc0ba2cf1984', '539732c0260bdc0ba2cf1985', '539732c0260bdc0ba2cf1986', '539732c0260bdc0ba2cf1987', '539732c0260bdc0ba2cf1988', '539732c0260bdc0ba2cf1989', '539732c0260bdc0ba2cf198a', '539732c0260bdc0ba2cf198b', '539732c0260bdc0ba2cf198c', '539732c0260bdc0ba2cf198d', '539732c0260bdc0ba2cf198e', '539732c0260bdc0ba2cf198f', '539732c0260bdc0ba2cf1990', '539732c0260bdc0ba2cf1991', '539732c0260bdc0ba2cf1992', '539732c0260bdc0ba2cf1993', '539732c0260bdc0ba2cf1994', '539732c0260bdc0ba2cf1995', '539732c0260bdc0ba2cf1996', '539732c0260bdc0ba2cf1997', '539732c0260bdc0ba2cf1998', '539732c0260bdc0ba2cf1999', '539732c0260bdc0ba2cf199a', '539732c0260bdc0ba2cf199b', '539732c0260bdc0ba2cf199c', '539732c0260bdc0ba2cf199d', '539732c0260bdc0ba2cf199e', '539732c0260bdc0ba2cf199f', '539732c0260bdc0ba2cf19a0', '539732c0260bdc0ba2cf19a1', '539732c0260bdc0ba2cf19a2', '539732c0260bdc0ba2cf19a3', '539732c0260bdc0ba2cf19a4', '539732c0260bdc0ba2cf19a5', '539732c0260bdc0ba2cf19a6', '539732c0260bdc0ba2cf19a7', '539732c0260bdc0ba2cf19a8', '539732c0260bdc0ba2cf19a9', '539732c0260bdc0ba2cf19aa', '539732c0260bdc0ba2cf19ab', '539732c0260bdc0ba2cf19ac', '539732c0260bdc0ba2cf19ad', '539732c0260bdc0ba2cf19ae', '539732c0260bdc0ba2cf19af', '539732c0260bdc0ba2cf19b0', '539732c0260bdc0ba2cf19b1', '539732c0260bdc0ba2cf19b2', '539732c0260bdc0ba2cf19b3', '539732c0260bdc0ba2cf19b4', '539732c0260bdc0ba2cf19b5', '539732c0260bdc0ba2cf19b6', '539732c0260bdc0ba2cf19b7', '539732c0260bdc0ba2cf19b8', '539732c0260bdc0ba2cf19b9', '539732c0260bdc0ba2cf19ba', '539732c0260bdc0ba2cf19bb', '539732c0260bdc0ba2cf19bc', '539732c0260bdc0ba2cf19bd', '539732c0260bdc0ba2cf19be', '539732c0260bdc0ba2cf19bf', '539732c0260bdc0ba2cf19c0', '539732c0260bdc0ba2cf19c1', '539732c0260bdc0ba2cf19c2', '539732c0260bdc0ba2cf19c3', '539732c0260bdc0ba2cf19c4', '539732c0260bdc0ba2cf19c5', '539732c0260bdc0ba2cf19c6', '539732c0260bdc0ba2cf19c7', '539732c0260bdc0ba2cf19c8', '539732c0260bdc0ba2cf19c9', '539732c0260bdc0ba2cf19ca', '539732c0260bdc0ba2cf19cb', '539732c0260bdc0ba2cf19cc', '539732c0260bdc0ba2cf19cd', '539732c0260bdc0ba2cf19ce', '539732c0260bdc0ba2cf19cf', '539732c0260bdc0ba2cf19d0', '539732c0260bdc0ba2cf19d1', '539732c0260bdc0ba2cf19d2', '539732c0260bdc0ba2cf19d3', '539732c0260bdc0ba2cf19d4', '539732c0260bdc0ba2cf19d5', '539732c0260bdc0ba2cf19d6', '539732c0260bdc0ba2cf19d7', '539732c0260bdc0ba2cf19d8', '539732c0260bdc0ba2cf19d9', '539732c0260bdc0ba2cf19da', '539732c0260bdc0ba2cf19db', '539732c0260bdc0ba2cf19dc', '539732c0260bdc0ba2cf19dd', '539732c0260bdc0ba2cf19de', '539732c0260bdc0ba2cf19df', '539732c0260bdc0ba2cf19e0', '539732c0260bdc0ba2cf19e1', '539732c0260bdc0ba2cf19e2', '539732c0260bdc0ba2cf19e3', '539732c0260bdc0ba2cf19e4', '539732c0260bdc0ba2cf19e5', '539732c0260bdc0ba2cf19e6', '539732c0260bdc0ba2cf19e7', '539732c0260bdc0ba2cf19e8', '539732c0260bdc0ba2cf19e9', '539732c0260bdc0ba2cf19ea', '539732c0260bdc0ba2cf19eb', '539732c0260bdc0ba2cf19ec', '539732c0260bdc0ba2cf19ed', '539732c0260bdc0ba2cf19ee', '539732c0260bdc0ba2cf19ef', '539732c0260bdc0ba2cf19f0', '539732c0260bdc0ba2cf19f1', '539732c0260bdc0ba2cf19f2', '539732c0260bdc0ba2cf19f3', '539732c0260bdc0ba2cf19f4', '539732c0260bdc0ba2cf19f5', '539732c0260bdc0ba2cf19f6', '539732c0260bdc0ba2cf19f7'] for execution on ComputePilot '539732c0260bdc0ba2cf1976'. 2014:06:10 10:31:03 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '539732c0260bdc0ba2cf19f1' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '539732c0260bdc0ba2cf19f1' state changed to PendingExecution. 2014:06:10 10:31:03 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '539732c0260bdc0ba2cf19f2' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '539732c0260bdc0ba2cf19f2' state changed to PendingExecution. 2014:06:10 10:31:03 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '539732c0260bdc0ba2cf19f3' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '539732c0260bdc0ba2cf19f3' state changed to PendingExecution. 2014:06:10 10:31:03 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '539732c0260bdc0ba2cf19f4' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '539732c0260bdc0ba2cf19f4' state changed to PendingExecution. 2014:06:10 10:31:03 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '539732c0260bdc0ba2cf19f5' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '539732c0260bdc0ba2cf19f5' state changed to PendingExecution. 2014:06:10 10:31:03 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '539732c0260bdc0ba2cf19f6' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '539732c0260bdc0ba2cf19f6' state changed to PendingExecution. 2014:06:10 10:31:03 radical.pilot.MainProcess: [INFO ] RUN ComputeUnit '539732c0260bdc0ba2cf19f7' state changed from 'New' to 'PendingExecution'. [Callback]: ComputeUnit '539732c0260bdc0ba2cf19f7' state changed to PendingExecution. 2014:06:10 10:31:04 radical.pilot.MainProcess: [INFO ] ComputePilot '539732c0260bdc0ba2cf1976' state changed from 'Launching' to 'PendingActive'. [Callback]: ComputePilot '539732c0260bdc0ba2cf1976' state changed to PendingActive. 2014:06:10 10:31:47 radical.pilot.MainProcess: [INFO ] ComputePilot '539732c0260bdc0ba2cf1976' state changed from 'PendingActive' to 'Failed'. [Callback]: ComputePilot '539732c0260bdc0ba2cf1976' state changed to Failed.

and the AGENT.STDERR :

% Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 1957k 100 1957k 0 0 2487k 0 --:--:-- --:--:-- --:--:-- 2641k warning: no files found matching '.html' under directory 'docs' warning: no previously-included files matching '.txt' found under directory 'docs/_build' no previously-included directories found matching 'docs/_build/_sources' zip_safe flag not set; analyzing archive contents... unable to execute gcc: No such file or directory /tmp/easy_install-CLAIf6/pymongo-2.7.1/setup.py:217: UserWarning:


WARNING: The bson._cbson extension module could not be compiled. No C extensions are essential for PyMongo to run, although they do result in significant speed improvements. The output above this warning shows how the compilation failed.

Please see the installation docs for solutions to build issues:

http://api.mongodb.org/python/current/installation.html

Here are some hints for popular operating systems:

If you are seeing this message on Linux you probably need to install GCC and/or the Python development package for your version of Python.

Debian and Ubuntu users should issue the following command:

$ sudo apt-get install build-essential python-dev

Users of Red Hat based distributions (RHEL, CentOS, Amazon Linux, Oracle Linux, Fedora, etc.) should issue the following command:

$ sudo yum install gcc python-devel

If you are seeing this message on Microsoft Windows please install PyMongo using the MS Windows installer for your version of Python, available on pypi here:

http://pypi.python.org/pypi/pymongo/#downloads

If you are seeing this message on OSX please read the documentation here:

http://api.mongodb.org/python/current/installation.html#osx


"The output above " unable to execute gcc: No such file or directory /tmp/easy_install-CLAIf6/pymongo-2.7.1/setup.py:217: UserWarning:


WARNING: The pymongo._cmessage extension module could not be compiled. No C extensions are essential for PyMongo to run, although they do result in significant speed improvements. The output above this warning shows how the compilation failed.

Please see the installation docs for solutions to build issues:

http://api.mongodb.org/python/current/installation.html

Here are some hints for popular operating systems:

If you are seeing this message on Linux you probably need to install GCC and/or the Python development package for your version of Python.

Debian and Ubuntu users should issue the following command:

$ sudo apt-get install build-essential python-dev

Users of Red Hat based distributions (RHEL, CentOS, Amazon Linux, Oracle Linux, Fedora, etc.) should issue the following command:

$ sudo yum install gcc python-devel

If you are seeing this message on Microsoft Windows please install PyMongo using the MS Windows installer for your version of Python, available on pypi here:

http://pypi.python.org/pypi/pymongo/#downloads

If you are seeing this message on OSX please read the documentation here:

http://api.mongodb.org/python/current/installation.html#osx


"The output above " zip_safe flag not set; analyzing archive contents...

Thanks and regards, Dinesh

On Tue, Jun 10, 2014 at 11:13 AM, Andre Merzky notifications@github.com wrote:

So the overhead of the installation is significant if you just run a couple of tasks. (For longer running pilots its less of a problem of course) This needs a better solution eventually.

see #68 https://github.com/radical-cybertools/radical.pilot/issues/68

— Reply to this email directly or view it on GitHub https://github.com/radical-cybertools/radical.pilot/issues/145#issuecomment-45626920 .

marksantcroos commented 10 years ago

On 10 Jun 2014, at 18:42 , dinesh-ganapati notifications@github.com wrote:

Hi Mark, Andre! Thanks for that pointer, Looks interesting. Also i find things are running smooth for 16 tasks running on 16 cores. but for 128 tasks running on 128 cores I get a failure. The only thing I change in my script is that I add a pdesc.queue="premium" and set the number jobs to 128 and teh pdesc.cores =128. I hope I haven't overlooked anything this time.

Hmm, that should not happen. Can you provide (parts of) the AGENT.LOG? (Once the agent starts the AGENT.STDOUT and .STDERR are of less interest and all the interesting stuff is in .LOG)

dinesh-ganapati commented 10 years ago

Hi Mark, the LOG reads:

2014-06-10 10:31:46,440 - radical.pilot.agent - INFO - RADICAL-Pilot agent for package/API version 0.14 2014-06-10 10:31:46,546 - radical.pilot.agent - INFO - No PBS_NODEFILE or SLURM_NODELIST found. Using hosts: ['localhost'] 2014-06-10 10:31:46,547 - radical.pilot.agent - INFO - Discovered execution environment: {'localhost': {'cores': 32, 'memory': 0}} 2014-06-10 10:31:46,547 - radical.pilot.agent - INFO - Discovered launch method: SSH (/usr/bin/ssh) 2014-06-10 10:31:46,548 - radical.pilot.agent - ERROR - Error setting up execution environment: Not enought cores available (32) to satisfy allocation request (128). 2014-06-10 10:31:46,548 - radical.pilot.agent - ERROR - Error setting up execution environment: Not enought cores available (32) to satisfy allocation request (128)

This is rather funny, I did set the pdesc.cores=128 and pdesc.queue='premium' , when the queue was set to small by default it allows only 32 cores. Also I have another question here I am not sure where we have a reference to the json config file in the script. Previously I recall defining RCONF ='path of the config file' and setting. pmgr = radical.pilot.PilotManager(session=session,resource_configurations=RCONF) but in the canonical example I don't see this.

Thanks and regards, Dinesh

On Tue, Jun 10, 2014 at 4:15 PM, Mark Santcroos notifications@github.com wrote:

On 10 Jun 2014, at 18:42 , dinesh-ganapati notifications@github.com wrote:

Hi Mark, Andre! Thanks for that pointer, Looks interesting. Also i find things are running smooth for 16 tasks running on 16 cores. but for 128 tasks running on 128 cores I get a failure. The only thing I change in my script is that I add a pdesc.queue="premium" and set the number jobs to 128 and teh pdesc.cores =128. I hope I haven't overlooked anything this time.

Hmm, that should not happen. Can you provide (parts of) the AGENT.LOG? (Once the agent starts the AGENT.STDOUT and .STDERR are of less interest and all the interesting stuff is in .LOG)

— Reply to this email directly or view it on GitHub https://github.com/radical-cybertools/radical.pilot/issues/145#issuecomment-45665491 .

marksantcroos commented 10 years ago

On 10 Jun 2014, at 23:03 , dinesh-ganapati notifications@github.com wrote:

2014-06-10 10:31:46,548 - radical.pilot.agent - ERROR - Error setting up execution environment: Not enought cores available (32) to satisfy allocation request (128)

This is rather funny, I did set the pdesc.cores=128 and pdesc.queue='premium' , when the queue was set to small by default it allows only 32 cores.

The multinode support was not in the devel branch yet, I’ve just merged that now. Can you try with a fresh devel?

Also I have another question here I am not sure where we have a reference to the json config file in the script. Previously I recall defining RCONF ='path of the config file' and setting. pmgr = radical.pilot.PilotManager(session=session,resource_configurations=RCONF) but in the canonical example I don't see this.

Correct, but that is not specific for yellowstone. All default config files are now distributed as part of RP and no need to explicitly configure them any longer.

dinesh-ganapati commented 10 years ago

Hi Mark, Thanks for the support :-) . RP now works on multinode on ys.

Regards, Dinesh

On Wed, Jun 11, 2014 at 3:13 AM, Mark Santcroos notifications@github.com wrote:

On 10 Jun 2014, at 23:03 , dinesh-ganapati notifications@github.com wrote:

2014-06-10 10:31:46,548 - radical.pilot.agent - ERROR - Error setting up execution environment: Not enought cores available (32) to satisfy allocation request (128)

This is rather funny, I did set the pdesc.cores=128 and pdesc.queue='premium' , when the queue was set to small by default it allows only 32 cores.

The multinode support was not in the devel branch yet, I’ve just merged that now. Can you try with a fresh devel?

Also I have another question here I am not sure where we have a reference to the json config file in the script. Previously I recall defining RCONF ='path of the config file' and setting. pmgr =

radical.pilot.PilotManager(session=session,resource_configurations=RCONF) but in the canonical example I don't see this.

Correct, but that is not specific for yellowstone. All default config files are now distributed as part of RP and no need to explicitly configure them any longer.

— Reply to this email directly or view it on GitHub https://github.com/radical-cybertools/radical.pilot/issues/145#issuecomment-45709174 .

marksantcroos commented 10 years ago

Ok great! I'm going to close this ticket now.

Feel free to open new tickets for specific issues you encounter (but that holds true in general of course)!

andre-merzky commented 10 years ago

This must have been the radical longest ticket, ever! ;)

dinesh-ganapati commented 10 years ago

Wow! that's how this goes down in history :-) !

Dinesh

On Wed, Jun 11, 2014 at 11:41 AM, Andre Merzky notifications@github.com wrote:

This must have been the radical longest ticket, ever! ;)

— Reply to this email directly or view it on GitHub https://github.com/radical-cybertools/radical.pilot/issues/145#issuecomment-45758818 .