saga-project / BigJob

SAGA-based Pilot-Job Implementation for Compute and Data
http://saga-project.github.com/BigJob/
Other
19 stars 8 forks source link

BigJob and the New TACC Server #89

Closed melrom closed 11 years ago

melrom commented 11 years ago

This is more of a reminder to myself...

BigJob is not working with the new redis server at TACC, but the redis server is ping-able through the redis-cli.

If anyone who works these tickets would like to help me diagnose, please ping me for the new password.

drelu commented 11 years ago

It works on Stampede as well:

03/28/2013 02:42:48 PM - bigjob - INFO - Loading BigJob version: 0.4.128 on login1.stampede.tacc.utexas.edu 03/28/2013 02:42:53 PM - bigjob - INFO - Using SAGA Bliss. Start Pilot Job/BigJob at: fork://localhost 03/28/2013 02:42:53 PM - bigjob - DEBUG - Utilizing Redis Backend 03/28/2013 02:42:53 PM - bigjob - DEBUG - Parsing URL: redis://Oily9tourSorenavyvault@redis01.tacc.utexas.edu:6379 03/28/2013 02:42:55 PM - bigjob - DEBUG - redis:// redis01.tacc.utexas.edu 6379 03/28/2013 02:42:55 PM - bigjob - DEBUG - Connect to Redis: redis01.tacc.utexas.edu Port: 6379 03/28/2013 02:42:55 PM - bigjob - DEBUG - init BigJob w/: redis://Oily9tourSorenavyvault@redis01.tacc.utexas.edu:6379 03/28/2013 02:42:55 PM - bigjob - DEBUG - initialized BigJob: bigjob:bj-afbad9ba-97df-11e2-ab5e-d4ae52a0f02f 03/28/2013 02:42:55 PM - bigjob - DEBUG - create pilot job entry on backend server: bigjob:bj-afbad9ba-97df-11e2-ab5e-d4ae52a0f02f:localhost 03/28/2013 02:42:55 PM - bigjob - DEBUG - update state of pilot job to: Unknown stopped: False 03/28/2013 02:42:55 PM - bigjob - DEBUG - update description of pilot job to: None 03/28/2013 02:42:55 PM - bigjob - DEBUG - set pilot state to: Unknown 03/28/2013 02:42:55 PM - bigjob - DEBUG - setting walltime to: 10 03/28/2013 02:42:55 PM - bigjob - DEBUG - Use SSH backend 03/28/2013 02:42:56 PM - bigjob - DEBUG - ['/home1/01131/tg804093/src/BigJob/pilot/filemanagement/../../../webhdfs-py/', '/home1/01131/tg804093/src/BigJob/examples/../', '/home1/01131/tg804093/src/BigJob/examples', '/home1/01131/tg804093/.bigjob/python/lib/python2.7/site-packages/distribute-0.6.34-py2.7.egg', '/home1/01131/tg804093/.bigjob/python/lib/python2.7/site-packages/pip-1.2.1-py2.7.egg', '/home1/01131/tg804093/.bigjob/python/lib/python2.7/site-packages/pexpect-2.4-py2.7.egg', '/home1/01131/tg804093/.bigjob/python/lib/python2.7/site-packages/simplejson-2.0.9-py2.7-linux-x86_64.egg', '/home1/01131/tg804093/.bigjob/python/lib/python2.7/site-packages/boto-2.2.2-py2.7.egg', '/home1/01131/tg804093/.bigjob/python/lib/python2.7/site-packages/globusonline_transfer_api_client-0.10.13-py2.7.egg', '/home1/01131/tg804093/.bigjob/python/lib/python2.7/site-packages/google_api_python_client-1.0-py2.7.egg', '/home1/01131/tg804093/.bigjob/python/lib/python2.7/site-packages/bliss-0.2.7-py2.7.egg', '/home1/01131/tg804093/.bigjob/python/lib/python2.7/site-packages/redis-2.2.4-py2.7.egg', '/home1/01131/tg804093/.bigjob/python/lib/python2.7/site-packages/virtualenv-1.8.4-py2.7.egg', '/home1/01131/tg804093/.bigjob/python/lib/python2.7/site-packages/threadpool-1.2.7-py2.7.egg', '/home1/01131/tg804093/.bigjob/python/lib/python2.7/site-packages/uuid-1.30-py2.7.egg', '/home1/01131/tg804093/.bigjob/python/lib/python2.7/site-packages/python_gflags-2.0-py2.7.egg', '/home1/01131/tg804093/.bigjob/python/lib/python2.7/site-packages/httplib2-0.7.7-py2.7.egg', '/home1/01131/tg804093/.bigjob/python/lib/python2.7/site-packages/paramiko_on_pypi-1.7.6-py2.7.egg', '/home1/01131/tg804093/.bigjob/python/lib/python2.7/site-packages/pycrypto_on_pypi-2.3-py2.7-linux-x86_64.egg', '/home1/01131/tg804093/.bigjob/python/lib/python2.7/site-packages/python_hostlist-1.14-py2.7.egg', '/home1/01131/tg804093/.bigjob/python/lib/python2.7/site-packages/BigJob-0.4.128-py2.7.egg', '/opt/apps/python/epd/7.3.2/modules/lib/python', '/opt/apps/python/epd/7.3.2/lib', '/home1/01131/tg804093/.bigjob/python/lib/python27.zip', '/home1/01131/tg804093/.bigjob/python/lib/python2.7', '/home1/01131/tg804093/.bigjob/python/lib/python2.7/plat-linux2', '/home1/01131/tg804093/.bigjob/python/lib/python2.7/lib-tk', '/home1/01131/tg804093/.bigjob/python/lib/python2.7/lib-old', '/home1/01131/tg804093/.bigjob/python/lib/python2.7/lib-dynload', '/opt/apps/python/epd/7.3.2/lib/python2.7', '/opt/apps/python/epd/7.3.2/lib/python2.7/plat-linux2', '/opt/apps/python/epd/7.3.2/lib/python2.7/lib-tk', '/home1/01131/tg804093/.bigjob/python/lib/python2.7/site-packages', '/opt/apps/python/epd/7.3.2/lib/python2.7/site-packages', '/opt/apps/python/epd/7.3.2/lib/python2.7/site-packages/PIL', '/home1/01131/tg804093/src/BigJob/examples/../bigjob', '/home1/01131/tg804093/src/BigJob/examples/../pilot/impl/../..', '/home1/01131/tg804093/src/BigJob/examples/../pilot/filemanagement/../..'] 03/28/2013 02:42:56 PM - bigjob - WARNING - WebHDFS package not found. 03/28/2013 02:42:56 PM - bigjob - DEBUG - Security Context: None 03/28/2013 02:42:56 PM - bigjob - DEBUG - BigJob working directory: ssh://localhost//tmp/bj-afbad9ba-97df-11e2-ab5e-d4ae52a0f02f 03/28/2013 02:42:56 PM - bigjob - DEBUG - Create directory: //tmp/bj-afbad9ba-97df-11e2-ab5e-d4ae52a0f02f 03/28/2013 02:42:56 PM - bigjob - DEBUG - ssh -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no -o NumberOfPasswordPrompts=0 localhost mkdir //tmp/bj-afbad9ba-97df-11e2-ab5e-d4ae52a0f02f 03/28/2013 02:42:57 PM - bigjob - DEBUG - Run ssh -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no -o NumberOfPasswordPrompts=0 localhost mkdir //tmp/bj-afbad9ba-97df-11e2-ab5e-d4ae52a0f02f Output: ["Warning: Permanently added 'localhost' (RSA) to the list of known hosts.\r\r\n"] 03/28/2013 02:42:57 PM - bigjob - WARNING - No file staging adaptor found. 03/28/2013 02:42:57 PM - bigjob - DEBUG - BJ Working Directory: /tmp 03/28/2013 02:42:57 PM - bigjob - DEBUG - Adaptor specific modifications: fork 03/28/2013 02:42:57 PM - bigjob - DEBUG - Escape Bliss 03/28/2013 02:42:57 PM - bigjob - DEBUG - "import sys import os import urllib import sys import time start_time = time.time() home = os.environ.get(\"HOME\")

print \"Home: \" + home

if home==None: home = os.getcwd() BIGJOB_AGENT_DIR= os.path.join(home, \".bigjob\") if not os.path.exists(BIGJOB_AGENT_DIR): os.mkdir (BIGJOB_AGENT_DIR) BIGJOB_PYTHON_DIR=BIGJOB_AGENT_DIR+\"/python/\" if not os.path.exists(BIGJOB_PYTHON_DIR): os.mkdir(BIGJOB_PYTHON_DIR) BOOTSTRAP_URL=\"https://raw.github.com/saga-project/BigJob/master/bootstrap/bigjob-bootstrap.py\" BOOTSTRAP_FILE=BIGJOB_AGENT_DIR+\"/bigjob-bootstrap.py\"

ensure that BJ in .bigjob is upfront in sys.path

sys.path.insert(0, os.getcwd() + \"/../\") p = list() for i in sys.path: if i.find(\".bigjob/python\")>1: p.insert(0, i) for i in p: sys.path.insert(0, i) print \"Python path: \" + str(sys.path) print \"Python version: \" + str(sys.version_info) try: import saga except: print \"SAGA and SAGA Python Bindings not found.\"; try: import bigjob.bigjob_agent except: print \"BigJob not installed. Attempt to install it.\"; opener = urllib.FancyURLopener({}); opener.retrieve(BOOTSTRAP_URL, BOOTSTRAP_FILE); print \"Execute: \" + \"python \" + BOOTSTRAP_FILE + \" \" + BIGJOB_PYTHON_DIR os.system(\"/usr/bin/env\") try: os.system(\"python \" + BOOTSTRAP_FILE + \" \" + BIGJOB_PYTHON_DIR); activate_this = os.path.join(BIGJOB_PYTHON_DIR, \"bin/activate_this.py\"); execfile(activate_this, dict(file=activate_this)) except: print \"BJ installation failed. Trying system-level python (/usr/bin/python)\"; os.system(\"/usr/bin/python \" + BOOTSTRAP_FILE + \" \" + BIGJOB_PYTHON_DIR); activate_this = os.path.join(BIGJOB_PYTHON_DIR, \"bin/activate_this.py\"); execfile(activate_this, dict(file=activate_this))

try to import BJ once again

import bigjob.bigjob_agent

execute bj agent

args = list() args.append(\"bigjob_agent.py\") args.append(\"redis://Oily9tourSorenavyvault@redis01.tacc.utexas.edu:6379\") args.append(\"bigjob:bj-afbad9ba-97df-11e2-ab5e-d4ae52a0f02f:localhost\") args.append(\"\") print \"Bootstrap time: \" + str(time.time()-start_time) print \"Starting BigJob Agents with following args: \" + str(args) bigjob_agent = bigjob.bigjob_agent.bigjob_agent(args) " 03/28/2013 02:42:57 PM - bigjob - DEBUG - Working directory: /tmp Job Description: {'Executable' : '/usr/bin/env','WorkingDirectory' : '/tmp','SPMDVariation' : 'single','Queue' : 'normal','WallTimeLimit' : '10','Arguments' : '['python', '-c', '"import sys\nimport os\nimport urllib\nimport sys\nimport time\nstart_time = time.time()\nhome = os.environ.get(\"HOME\")\n#print \"Home: \" + home\nif home==None: home = os.getcwd()\nBIGJOB_AGENT_DIR= os.path.join(home, \".bigjob\")\nif not os.path.exists(BIGJOB_AGENT_DIR): os.mkdir (BIGJOB_AGENT_DIR)\nBIGJOB_PYTHON_DIR=BIGJOB_AGENT_DIR+\"/python/\"\nif not os.path.exists(BIGJOB_PYTHON_DIR): os.mkdir(BIGJOB_PYTHON_DIR)\nBOOTSTRAP_URL=\"https://raw.github.com/saga-project/BigJob/master/bootstrap/bigjob-bootstrap.py\\"\nBOOTSTRAP_FILE=BIGJOB_AGENT_DIR+\\"/bigjob-bootstrap.py\\"\n#ensure that BJ in .bigjob is upfront in sys.path\nsys.path.insert(0, os.getcwd() + \"/../\")\np = list()\nfor i in sys.path:\n if i.find(\".bigjob/python\")>1:\n p.insert(0, i)\nfor i in p: sys.path.insert(0, i)\nprint \"Python path: \" + str(sys.path)\nprint \"Python version: \" + str(sys.version_info)\ntry: import saga\nexcept: print \"SAGA and SAGA Python Bindings not found.\";\ntry: import bigjob.bigjob_agent\nexcept: \n print \"BigJob not installed. Attempt to install it.\"; \n opener = urllib.FancyURLopener({}); \n opener.retrieve(BOOTSTRAP_URL, BOOTSTRAP_FILE); \n print \"Execute: \" + \"python \" + BOOTSTRAP_FILE + \" \" + BIGJOB_PYTHON_DIR\n os.system(\"/usr/bin/env\")\n try:\n os.system(\"python \" + BOOTSTRAP_FILE + \" \" + BIGJOB_PYTHON_DIR); \n activate_this = os.path.join(BIGJOB_PYTHON_DIR, \"bin/activate_this.py\"); \n execfile(activate_this, dict(file=activate_this))\n except:\n print \"BJ installation failed. Trying system-level python (/usr/bin/python)\";\n os.system(\"/usr/bin/python \" + BOOTSTRAP_FILE + \" \" + BIGJOB_PYTHON_DIR); \n activate_this = os.path.join(BIGJOB_PYTHON_DIR, \"bin/activate_this.py\"); \n execfile(activate_this, dict(file=activate_this))\n#try to import BJ once again\nimport bigjob.bigjob_agent\n# execute bj agent\nargs = list()\nargs.append(\"bigjob_agent.py\")\nargs.append(\"redis://Oily9tourSorenavyvault@redis01.tacc.utexas.edu:6379\")\nargs.append(\"bigjob:bj-afbad9ba-97df-11e2-ab5e-d4ae52a0f02f:localhost\")\nargs.append(\"\")\nprint \"Bootstrap time: \" + str(time.time()-start_time)\nprint \"Starting BigJob Agents with following args: \" + str(args)\nbigjob_agent = bigjob.bigjob_agent.bigjob_agent(args)\n"']','TotalCPUCount' : '8',} 03/28/2013 02:42:57 PM - bigjob - DEBUG - Creating pilot job with description: {'Executable' : '/usr/bin/env','WorkingDirectory' : '/tmp','SPMDVariation' : 'single','Queue' : 'normal','WallTimeLimit' : '10','Arguments' : '['python', '-c', '"import sys\nimport os\nimport urllib\nimport sys\nimport time\nstart_time = time.time()\nhome = os.environ.get(\"HOME\")\n#print \"Home: \" + home\nif home==None: home = os.getcwd()\nBIGJOB_AGENT_DIR= os.path.join(home, \".bigjob\")\nif not os.path.exists(BIGJOB_AGENT_DIR): os.mkdir (BIGJOB_AGENT_DIR)\nBIGJOB_PYTHON_DIR=BIGJOB_AGENT_DIR+\"/python/\"\nif not os.path.exists(BIGJOB_PYTHON_DIR): os.mkdir(BIGJOB_PYTHON_DIR)\nBOOTSTRAP_URL=\"https://raw.github.com/saga-project/BigJob/master/bootstrap/bigjob-bootstrap.py\\"\nBOOTSTRAP_FILE=BIGJOB_AGENT_DIR+\\"/bigjob-bootstrap.py\\"\n#ensure that BJ in .bigjob is upfront in sys.path\nsys.path.insert(0, os.getcwd() + \"/../\")\np = list()\nfor i in sys.path:\n if i.find(\".bigjob/python\")>1:\n p.insert(0, i)\nfor i in p: sys.path.insert(0, i)\nprint \"Python path: \" + str(sys.path)\nprint \"Python version: \" + str(sys.version_info)\ntry: import saga\nexcept: print \"SAGA and SAGA Python Bindings not found.\";\ntry: import bigjob.bigjob_agent\nexcept: \n print \"BigJob not installed. Attempt to install it.\"; \n opener = urllib.FancyURLopener({}); \n opener.retrieve(BOOTSTRAP_URL, BOOTSTRAP_FILE); \n print \"Execute: \" + \"python \" + BOOTSTRAP_FILE + \" \" + BIGJOB_PYTHON_DIR\n os.system(\"/usr/bin/env\")\n try:\n os.system(\"python \" + BOOTSTRAP_FILE + \" \" + BIGJOB_PYTHON_DIR); \n activate_this = os.path.join(BIGJOB_PYTHON_DIR, \"bin/activate_this.py\"); \n execfile(activate_this, dict(file=activate_this))\n except:\n print \"BJ installation failed. Trying system-level python (/usr/bin/python)\";\n os.system(\"/usr/bin/python \" + BOOTSTRAP_FILE + \" \" + BIGJOB_PYTHON_DIR); \n activate_this = os.path.join(BIGJOB_PYTHON_DIR, \"bin/activate_this.py\"); \n execfile(activate_this, dict(file=activate_this))\n#try to import BJ once again\nimport bigjob.bigjob_agent\n# execute bj agent\nargs = list()\nargs.append(\"bigjob_agent.py\")\nargs.append(\"redis://Oily9tourSorenavyvault@redis01.tacc.utexas.edu:6379\")\nargs.append(\"bigjob:bj-afbad9ba-97df-11e2-ab5e-d4ae52a0f02f:localhost\")\nargs.append(\"\")\nprint \"Bootstrap time: \" + str(time.time()-start_time)\nprint \"Starting BigJob Agents with following args: \" + str(args)\nbigjob_agent = bigjob.bigjob_agent.bigjob_agent(args)\n"']','Error' : '/tmp/stderr-bj-afbad9ba-97df-11e2-ab5e-d4ae52a0f02f-agent.txt','Output' : '/tmp/stdout-bj-afbad9ba-97df-11e2-ab5e-d4ae52a0f02f-agent.txt','TotalCPUCount' : '8',} 03/28/2013 02:42:57 PM - bigjob - DEBUG - Submit pilot job to: fork://localhost Pilot Job/BigJob URL: bigjob:bj-afbad9ba-97df-11e2-ab5e-d4ae52a0f02f:localhost State: Unknown 03/28/2013 02:42:57 PM - bigjob - DEBUG - add subjob to queue of PJ: bigjob:bj-afbad9ba-97df-11e2-ab5e-d4ae52a0f02f:localhost 03/28/2013 02:42:57 PM - bigjob - DEBUG - create dictionary for job description. Job-URL: bigjob:bj-afbad9ba-97df-11e2-ab5e-d4ae52a0f02f:localhost:jobs:sj-b0e5aeaa-97df-11e2-ab5e-d4ae52a0f02f 03/28/2013 02:42:57 PM - bigjob - DEBUG - SJ Attributes: {'Executable' : '/bin/echo','Environment' : '['HELLOWORLD=hello_world']','Arguments' : '['$HELLOWORLD']','NumberOfProcesses' : '1','Error' : 'stderr.txt','Output' : 'stdout.txt',} 03/28/2013 02:42:57 PM - bigjob - DEBUG - job dict: {'Executable': '/bin/echo', 'NumberOfProcesses': 1, 'Environment': ['HELLOWORLD=hello_world'], 'state': 'Unknown', 'Arguments': ['$HELLOWORLD'], 'Error': 'stderr.txt', 'Output': 'stdout.txt', 'job-id': 'sj-b0e5aeaa-97df-11e2-ab5e-d4ae52a0f02f'} 03/28/2013 02:42:57 PM - bigjob - DEBUG - set job state to: Unknown 03/28/2013 02:42:57 PM - bigjob - DEBUG - Get subjob state: bigjob:bj-afbad9ba-97df-11e2-ab5e-d4ae52a0f02f:localhost:jobs:sj-b0e5aeaa-97df-11e2-ab5e-d4ae52a0f02f state: Unknown 03/28/2013 02:42:59 PM - bigjob - DEBUG - Get subjob state: bigjob:bj-afbad9ba-97df-11e2-ab5e-d4ae52a0f02f:localhost:jobs:sj-b0e5aeaa-97df-11e2-ab5e-d4ae52a0f02f state: Unknown 03/28/2013 02:43:01 PM - bigjob - DEBUG - Get subjob state: bigjob:bj-afbad9ba-97df-11e2-ab5e-d4ae52a0f02f:localhost:jobs:sj-b0e5aeaa-97df-11e2-ab5e-d4ae52a0f02f state: Running 03/28/2013 02:43:03 PM - bigjob - DEBUG - Get subjob state: bigjob:bj-afbad9ba-97df-11e2-ab5e-d4ae52a0f02f:localhost:jobs:sj-b0e5aeaa-97df-11e2-ab5e-d4ae52a0f02f state: Running 03/28/2013 02:43:05 PM - bigjob - DEBUG - Get subjob state: bigjob:bj-afbad9ba-97df-11e2-ab5e-d4ae52a0f02f:localhost:jobs:sj-b0e5aeaa-97df-11e2-ab5e-d4ae52a0f02f state: Running 03/28/2013 02:43:07 PM - bigjob - DEBUG - Get subjob state: bigjob:bj-afbad9ba-97df-11e2-ab5e-d4ae52a0f02f:localhost:jobs:sj-b0e5aeaa-97df-11e2-ab5e-d4ae52a0f02f state: Done 03/28/2013 02:43:07 PM - bigjob - DEBUG - Cancel Pilot Job 03/28/2013 02:43:07 PM - bigjob - DEBUG - stop pilot job: bigjob:bj-afbad9ba-97df-11e2-ab5e-d4ae52a0f02f:localhost 03/28/2013 02:43:07 PM - bigjob - DEBUG - update state of pilot job to: Done stopped: True 03/28/2013 02:43:07 PM - bigjob - DEBUG - delete pilot job: bigjob:bj-afbad9ba-97df-11e2-ab5e-d4ae52a0f02f:localhost 03/28/2013 02:43:07 PM - bigjob - DEBUG - Cancel Pilot Job finished

On Wed, Mar 27, 2013 at 2:17 PM, Melissa notifications@github.com wrote:

This is more of a reminder to myself...

BigJob is not working with the new redis server at TACC, but it is ping-able and works within SAGA. This COULD be an issue with the parsing happening in SAGA URL, but I have to diagnose what is really going on.

If anyone who works these tickets would like to help me diagnose, please ping me for the new password.

— Reply to this email directly or view it on GitHub.

drelu commented 11 years ago

Ok, now I see the issue:

the Redis server is not reachable from the Stampede compute nodes:

(python)tg804093@c402-002.stampede ~$ telnet redis01.tacc.utexas.edu 6379 Trying 129.114.60.146...

@Yaakoub: Can you have a look at this?

Thanks, Andre

On Thu, Mar 28, 2013 at 8:45 PM, Andre Luckow andre.luckow@gmail.com wrote:

It works on Stampede as well:

03/28/2013 02:42:48 PM - bigjob - INFO - Loading BigJob version: 0.4.128 on login1.stampede.tacc.utexas.edu 03/28/2013 02:42:53 PM - bigjob - INFO - Using SAGA Bliss. Start Pilot Job/BigJob at: fork://localhost 03/28/2013 02:42:53 PM - bigjob - DEBUG - Utilizing Redis Backend 03/28/2013 02:42:53 PM - bigjob - DEBUG - Parsing URL: redis://Oily9tourSorenavyvault@redis01.tacc.utexas.edu:6379 03/28/2013 02:42:55 PM - bigjob - DEBUG - redis:// redis01.tacc.utexas.edu 6379 03/28/2013 02:42:55 PM - bigjob - DEBUG - Connect to Redis: redis01.tacc.utexas.edu Port: 6379 03/28/2013 02:42:55 PM - bigjob - DEBUG - init BigJob w/: redis://Oily9tourSorenavyvault@redis01.tacc.utexas.edu:6379 03/28/2013 02:42:55 PM - bigjob - DEBUG - initialized BigJob: bigjob:bj-afbad9ba-97df-11e2-ab5e-d4ae52a0f02f 03/28/2013 02:42:55 PM - bigjob - DEBUG - create pilot job entry on backend server: bigjob:bj-afbad9ba-97df-11e2-ab5e-d4ae52a0f02f:localhost 03/28/2013 02:42:55 PM - bigjob - DEBUG - update state of pilot job to: Unknown stopped: False 03/28/2013 02:42:55 PM - bigjob - DEBUG - update description of pilot job to: None 03/28/2013 02:42:55 PM - bigjob - DEBUG - set pilot state to: Unknown 03/28/2013 02:42:55 PM - bigjob - DEBUG - setting walltime to: 10 03/28/2013 02:42:55 PM - bigjob - DEBUG - Use SSH backend 03/28/2013 02:42:56 PM - bigjob - DEBUG - ['/home1/01131/tg804093/src/BigJob/pilot/filemanagement/../../../webhdfs-py/', '/home1/01131/tg804093/src/BigJob/examples/../', '/home1/01131/tg804093/src/BigJob/examples', '/home1/01131/tg804093/.bigjob/python/lib/python2.7/site-packages/distribute-0.6.34-py2.7.egg', '/home1/01131/tg804093/.bigjob/python/lib/python2.7/site-packages/pip-1.2.1-py2.7.egg', '/home1/01131/tg804093/.bigjob/python/lib/python2.7/site-packages/pexpect-2.4-py2.7.egg', '/home1/01131/tg804093/.bigjob/python/lib/python2.7/site-packages/simplejson-2.0.9-py2.7-linux-x86_64.egg', '/home1/01131/tg804093/.bigjob/python/lib/python2.7/site-packages/boto-2.2.2-py2.7.egg', '/home1/01131/tg804093/.bigjob/python/lib/python2.7/site-packages/globusonline_transfer_api_client-0.10.13-py2.7.egg', '/home1/01131/tg804093/.bigjob/python/lib/python2.7/site-packages/google_api_python_client-1.0-py2.7.egg', '/home1/01131/tg804093/.bigjob/python/lib/python2.7/site-packages/bliss-0.2.7-py2.7.egg', '/home1/01131/tg804093/.bigjob/python/lib/python2.7/site-packages/redis-2.2.4-py2.7.egg', '/home1/01131/tg804093/.bigjob/python/lib/python2.7/site-packages/virtualenv-1.8.4-py2.7.egg', '/home1/01131/tg804093/.bigjob/python/lib/python2.7/site-packages/threadpool-1.2.7-py2.7.egg', '/home1/01131/tg804093/.bigjob/python/lib/python2.7/site-packages/uuid-1.30-py2.7.egg', '/home1/01131/tg804093/.bigjob/python/lib/python2.7/site-packages/python_gflags-2.0-py2.7.egg', '/home1/01131/tg804093/.bigjob/python/lib/python2.7/site-packages/httplib2-0.7.7-py2.7.egg', '/home1/01131/tg804093/.bigjob/python/lib/python2.7/site-packages/paramiko_on_pypi-1.7.6-py2.7.egg', '/home1/01131/tg804093/.bigjob/python/lib/python2.7/site-packages/pycrypto_on_pypi-2.3-py2.7-linux-x86_64.egg', '/home1/01131/tg804093/.bigjob/python/lib/python2.7/site-packages/python_hostlist-1.14-py2.7.egg', '/home1/01131/tg804093/.bigjob/python/lib/python2.7/site-packages/BigJob-0.4.128-py2.7.egg', '/opt/apps/python/epd/7.3.2/modules/lib/python', '/opt/apps/python/epd/7.3.2/lib', '/home1/01131/tg804093/.bigjob/python/lib/python27.zip', '/home1/01131/tg804093/.bigjob/python/lib/python2.7', '/home1/01131/tg804093/.bigjob/python/lib/python2.7/plat-linux2', '/home1/01131/tg804093/.bigjob/python/lib/python2.7/lib-tk', '/home1/01131/tg804093/.bigjob/python/lib/python2.7/lib-old', '/home1/01131/tg804093/.bigjob/python/lib/python2.7/lib-dynload', '/opt/apps/python/epd/7.3.2/lib/python2.7', '/opt/apps/python/epd/7.3.2/lib/python2.7/plat-linux2', '/opt/apps/python/epd/7.3.2/lib/python2.7/lib-tk', '/home1/01131/tg804093/.bigjob/python/lib/python2.7/site-packages', '/opt/apps/python/epd/7.3.2/lib/python2.7/site-packages', '/opt/apps/python/epd/7.3.2/lib/python2.7/site-packages/PIL', '/home1/01131/tg804093/src/BigJob/examples/../bigjob', '/home1/01131/tg804093/src/BigJob/examples/../pilot/impl/../..', '/home1/01131/tg804093/src/BigJob/examples/../pilot/filemanagement/../..'] 03/28/2013 02:42:56 PM - bigjob - WARNING - WebHDFS package not found. 03/28/2013 02:42:56 PM - bigjob - DEBUG - Security Context: None 03/28/2013 02:42:56 PM - bigjob - DEBUG - BigJob working directory: ssh://localhost//tmp/bj-afbad9ba-97df-11e2-ab5e-d4ae52a0f02f 03/28/2013 02:42:56 PM - bigjob - DEBUG - Create directory: //tmp/bj-afbad9ba-97df-11e2-ab5e-d4ae52a0f02f 03/28/2013 02:42:56 PM - bigjob - DEBUG - ssh -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no -o NumberOfPasswordPrompts=0 localhost mkdir //tmp/bj-afbad9ba-97df-11e2-ab5e-d4ae52a0f02f 03/28/2013 02:42:57 PM - bigjob - DEBUG - Run ssh -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no -o NumberOfPasswordPrompts=0 localhost mkdir //tmp/bj-afbad9ba-97df-11e2-ab5e-d4ae52a0f02f Output: ["Warning: Permanently added 'localhost' (RSA) to the list of known hosts.\r\r\n"] 03/28/2013 02:42:57 PM - bigjob - WARNING - No file staging adaptor found. 03/28/2013 02:42:57 PM - bigjob - DEBUG - BJ Working Directory: /tmp 03/28/2013 02:42:57 PM - bigjob - DEBUG - Adaptor specific modifications: fork 03/28/2013 02:42:57 PM - bigjob - DEBUG - Escape Bliss 03/28/2013 02:42:57 PM - bigjob - DEBUG - "import sys import os import urllib import sys import time start_time = time.time() home = os.environ.get(\"HOME\")

print \"Home: \" + home

if home==None: home = os.getcwd() BIGJOB_AGENT_DIR= os.path.join(home, \".bigjob\") if not os.path.exists(BIGJOB_AGENT_DIR): os.mkdir (BIGJOB_AGENT_DIR) BIGJOB_PYTHON_DIR=BIGJOB_AGENT_DIR+\"/python/\" if not os.path.exists(BIGJOB_PYTHON_DIR): os.mkdir(BIGJOB_PYTHON_DIR) BOOTSTRAP_URL=\"https://raw.github.com/saga-project/BigJob/master/bootstrap/bigjob-bootstrap.py\" BOOTSTRAP_FILE=BIGJOB_AGENT_DIR+\"/bigjob-bootstrap.py\"

ensure that BJ in .bigjob is upfront in sys.path

sys.path.insert(0, os.getcwd() + \"/../\") p = list() for i in sys.path: if i.find(\".bigjob/python\")>1: p.insert(0, i) for i in p: sys.path.insert(0, i) print \"Python path: \" + str(sys.path) print \"Python version: \" + str(sys.version_info) try: import saga except: print \"SAGA and SAGA Python Bindings not found.\"; try: import bigjob.bigjob_agent except: print \"BigJob not installed. Attempt to install it.\"; opener = urllib.FancyURLopener({}); opener.retrieve(BOOTSTRAP_URL, BOOTSTRAP_FILE); print \"Execute: \" + \"python \" + BOOTSTRAP_FILE + \" \" + BIGJOB_PYTHON_DIR os.system(\"/usr/bin/env\") try: os.system(\"python \" + BOOTSTRAP_FILE + \" \" + BIGJOB_PYTHON_DIR); activate_this = os.path.join(BIGJOB_PYTHON_DIR, \"bin/activate_this.py\"); execfile(activate_this, dict(file=activate_this)) except: print \"BJ installation failed. Trying system-level python (/usr/bin/python)\"; os.system(\"/usr/bin/python \" + BOOTSTRAP_FILE + \" \" + BIGJOB_PYTHON_DIR); activate_this = os.path.join(BIGJOB_PYTHON_DIR, \"bin/activate_this.py\"); execfile(activate_this, dict(file=activate_this))

try to import BJ once again

import bigjob.bigjob_agent

execute bj agent

args = list() args.append(\"bigjob_agent.py\") args.append(\"redis://Oily9tourSorenavyvault@redis01.tacc.utexas.edu:6379\") args.append(\"bigjob:bj-afbad9ba-97df-11e2-ab5e-d4ae52a0f02f:localhost\") args.append(\"\") print \"Bootstrap time: \" + str(time.time()-start_time) print \"Starting BigJob Agents with following args: \" + str(args) bigjob_agent = bigjob.bigjob_agent.bigjob_agent(args) " 03/28/2013 02:42:57 PM - bigjob - DEBUG - Working directory: /tmp Job Description: {'Executable' : '/usr/bin/env','WorkingDirectory' : '/tmp','SPMDVariation' : 'single','Queue' : 'normal','WallTimeLimit' : '10','Arguments' : '['python', '-c', '"import sys\nimport os\nimport urllib\nimport sys\nimport time\nstart_time = time.time()\nhome = os.environ.get(\"HOME\")\n#print \"Home: \" + home\nif home==None: home = os.getcwd()\nBIGJOB_AGENT_DIR= os.path.join(home, \".bigjob\")\nif not os.path.exists(BIGJOB_AGENT_DIR): os.mkdir (BIGJOB_AGENT_DIR)\nBIGJOB_PYTHON_DIR=BIGJOB_AGENT_DIR+\"/python/\"\nif not os.path.exists(BIGJOB_PYTHON_DIR): os.mkdir(BIGJOB_PYTHON_DIR)\nBOOTSTRAP_URL=\"https://raw.github.com/saga-project/BigJob/master/bootstrap/bigjob-bootstrap.py\\"\nBOOTSTRAP_FILE=BIGJOB_AGENT_DIR+\\"/bigjob-bootstrap.py\\"\n#ensure that BJ in .bigjob is upfront in sys.path\nsys.path.insert(0, os.getcwd() + \"/../\")\np = list()\nfor i in sys.path:\n if i.find(\".bigjob/python\")>1:\n p.insert(0, i)\nfor i in p: sys.path.insert(0, i)\nprint \"Python path: \" + str(sys.path)\nprint \"Python version: \" + str(sys.version_info)\ntry: import saga\nexcept: print \"SAGA and SAGA Python Bindings not found.\";\ntry: import bigjob.bigjob_agent\nexcept: \n print \"BigJob not installed. Attempt to install it.\"; \n opener = urllib.FancyURLopener({}); \n opener.retrieve(BOOTSTRAP_URL, BOOTSTRAP_FILE); \n print \"Execute: \" + \"python \" + BOOTSTRAP_FILE + \" \" + BIGJOB_PYTHON_DIR\n os.system(\"/usr/bin/env\")\n try:\n os.system(\"python \" + BOOTSTRAP_FILE + \" \" + BIGJOB_PYTHON_DIR); \n activate_this = os.path.join(BIGJOB_PYTHON_DIR, \"bin/activate_this.py\"); \n execfile(activate_this, dict(file=activate_this))\n except:\n print \"BJ installation failed. Trying system-level python (/usr/bin/python)\";\n os.system(\"/usr/bin/python \" + BOOTSTRAP_FILE + \" \" + BIGJOB_PYTHON_DIR); \n activate_this = os.path.join(BIGJOB_PYTHON_DIR, \"bin/activate_this.py\"); \n execfile(activate_this, dict(file=activate_this))\n#try to import BJ once again\nimport bigjob.bigjob_agent\n# execute bj agent\nargs = list()\nargs.append(\"bigjob_agent.py\")\nargs.append(\"redis://Oily9tourSorenavyvault@redis01.tacc.utexas.edu:6379\")\nargs.append(\"bigjob:bj-afbad9ba-97df-11e2-ab5e-d4ae52a0f02f:localhost\")\nargs.append(\"\")\nprint \"Bootstrap time: \" + str(time.time()-start_time)\nprint \"Starting BigJob Agents with following args: \" + str(args)\nbigjob_agent = bigjob.bigjob_agent.bigjob_agent(args)\n"']','TotalCPUCount' : '8',} 03/28/2013 02:42:57 PM - bigjob - DEBUG - Creating pilot job with description: {'Executable' : '/usr/bin/env','WorkingDirectory' : '/tmp','SPMDVariation' : 'single','Queue' : 'normal','WallTimeLimit' : '10','Arguments' : '['python', '-c', '"import sys\nimport os\nimport urllib\nimport sys\nimport time\nstart_time = time.time()\nhome = os.environ.get(\"HOME\")\n#print \"Home: \" + home\nif home==None: home = os.getcwd()\nBIGJOB_AGENT_DIR= os.path.join(home, \".bigjob\")\nif not os.path.exists(BIGJOB_AGENT_DIR): os.mkdir (BIGJOB_AGENT_DIR)\nBIGJOB_PYTHON_DIR=BIGJOB_AGENT_DIR+\"/python/\"\nif not os.path.exists(BIGJOB_PYTHON_DIR): os.mkdir(BIGJOB_PYTHON_DIR)\nBOOTSTRAP_URL=\"https://raw.github.com/saga-project/BigJob/master/bootstrap/bigjob-bootstrap.py\\"\nBOOTSTRAP_FILE=BIGJOB_AGENT_DIR+\\"/bigjob-bootstrap.py\\"\n#ensure that BJ in .bigjob is upfront in sys.path\nsys.path.insert(0, os.getcwd() + \"/../\")\np = list()\nfor i in sys.path:\n if i.find(\".bigjob/python\")>1:\n p.insert(0, i)\nfor i in p: sys.path.insert(0, i)\nprint \"Python path: \" + str(sys.path)\nprint \"Python version: \" + str(sys.version_info)\ntry: import saga\nexcept: print \"SAGA and SAGA Python Bindings not found.\";\ntry: import bigjob.bigjob_agent\nexcept: \n print \"BigJob not installed. Attempt to install it.\"; \n opener = urllib.FancyURLopener({}); \n opener.retrieve(BOOTSTRAP_URL, BOOTSTRAP_FILE); \n print \"Execute: \" + \"python \" + BOOTSTRAP_FILE + \" \" + BIGJOB_PYTHON_DIR\n os.system(\"/usr/bin/env\")\n try:\n os.system(\"python \" + BOOTSTRAP_FILE + \" \" + BIGJOB_PYTHON_DIR); \n activate_this = os.path.join(BIGJOB_PYTHON_DIR, \"bin/activate_this.py\"); \n execfile(activate_this, dict(file=activate_this))\n except:\n print \"BJ installation failed. Trying system-level python (/usr/bin/python)\";\n os.system(\"/usr/bin/python \" + BOOTSTRAP_FILE + \" \" + BIGJOB_PYTHON_DIR); \n activate_this = os.path.join(BIGJOB_PYTHON_DIR, \"bin/activate_this.py\"); \n execfile(activate_this, dict(file=activate_this))\n#try to import BJ once again\nimport bigjob.bigjob_agent\n# execute bj agent\nargs = list()\nargs.append(\"bigjob_agent.py\")\nargs.append(\"redis://Oily9tourSorenavyvault@redis01.tacc.utexas.edu:6379\")\nargs.append(\"bigjob:bj-afbad9ba-97df-11e2-ab5e-d4ae52a0f02f:localhost\")\nargs.append(\"\")\nprint \"Bootstrap time: \" + str(time.time()-start_time)\nprint \"Starting BigJob Agents with following args: \" + str(args)\nbigjob_agent = bigjob.bigjob_agent.bigjob_agent(args)\n"']','Error' : '/tmp/stderr-bj-afbad9ba-97df-11e2-ab5e-d4ae52a0f02f-agent.txt','Output' : '/tmp/stdout-bj-afbad9ba-97df-11e2-ab5e-d4ae52a0f02f-agent.txt','TotalCPUCount' : '8',} 03/28/2013 02:42:57 PM - bigjob - DEBUG - Submit pilot job to: fork://localhost Pilot Job/BigJob URL: bigjob:bj-afbad9ba-97df-11e2-ab5e-d4ae52a0f02f:localhost State: Unknown 03/28/2013 02:42:57 PM - bigjob - DEBUG - add subjob to queue of PJ: bigjob:bj-afbad9ba-97df-11e2-ab5e-d4ae52a0f02f:localhost 03/28/2013 02:42:57 PM - bigjob - DEBUG - create dictionary for job description. Job-URL: bigjob:bj-afbad9ba-97df-11e2-ab5e-d4ae52a0f02f:localhost:jobs:sj-b0e5aeaa-97df-11e2-ab5e-d4ae52a0f02f 03/28/2013 02:42:57 PM - bigjob - DEBUG - SJ Attributes: {'Executable' : '/bin/echo','Environment' : '['HELLOWORLD=hello_world']','Arguments' : '['$HELLOWORLD']','NumberOfProcesses' : '1','Error' : 'stderr.txt','Output' : 'stdout.txt',} 03/28/2013 02:42:57 PM - bigjob - DEBUG - job dict: {'Executable': '/bin/echo', 'NumberOfProcesses': 1, 'Environment': ['HELLOWORLD=hello_world'], 'state': 'Unknown', 'Arguments': ['$HELLOWORLD'], 'Error': 'stderr.txt', 'Output': 'stdout.txt', 'job-id': 'sj-b0e5aeaa-97df-11e2-ab5e-d4ae52a0f02f'} 03/28/2013 02:42:57 PM - bigjob - DEBUG - set job state to: Unknown 03/28/2013 02:42:57 PM - bigjob - DEBUG - Get subjob state: bigjob:bj-afbad9ba-97df-11e2-ab5e-d4ae52a0f02f:localhost:jobs:sj-b0e5aeaa-97df-11e2-ab5e-d4ae52a0f02f state: Unknown 03/28/2013 02:42:59 PM - bigjob - DEBUG - Get subjob state: bigjob:bj-afbad9ba-97df-11e2-ab5e-d4ae52a0f02f:localhost:jobs:sj-b0e5aeaa-97df-11e2-ab5e-d4ae52a0f02f state: Unknown 03/28/2013 02:43:01 PM - bigjob - DEBUG - Get subjob state: bigjob:bj-afbad9ba-97df-11e2-ab5e-d4ae52a0f02f:localhost:jobs:sj-b0e5aeaa-97df-11e2-ab5e-d4ae52a0f02f state: Running 03/28/2013 02:43:03 PM - bigjob - DEBUG - Get subjob state: bigjob:bj-afbad9ba-97df-11e2-ab5e-d4ae52a0f02f:localhost:jobs:sj-b0e5aeaa-97df-11e2-ab5e-d4ae52a0f02f state: Running 03/28/2013 02:43:05 PM - bigjob - DEBUG - Get subjob state: bigjob:bj-afbad9ba-97df-11e2-ab5e-d4ae52a0f02f:localhost:jobs:sj-b0e5aeaa-97df-11e2-ab5e-d4ae52a0f02f state: Running 03/28/2013 02:43:07 PM - bigjob - DEBUG - Get subjob state: bigjob:bj-afbad9ba-97df-11e2-ab5e-d4ae52a0f02f:localhost:jobs:sj-b0e5aeaa-97df-11e2-ab5e-d4ae52a0f02f state: Done 03/28/2013 02:43:07 PM - bigjob - DEBUG - Cancel Pilot Job 03/28/2013 02:43:07 PM - bigjob - DEBUG - stop pilot job: bigjob:bj-afbad9ba-97df-11e2-ab5e-d4ae52a0f02f:localhost 03/28/2013 02:43:07 PM - bigjob - DEBUG - update state of pilot job to: Done stopped: True 03/28/2013 02:43:07 PM - bigjob - DEBUG - delete pilot job: bigjob:bj-afbad9ba-97df-11e2-ab5e-d4ae52a0f02f:localhost 03/28/2013 02:43:07 PM - bigjob - DEBUG - Cancel Pilot Job finished

On Wed, Mar 27, 2013 at 2:17 PM, Melissa notifications@github.com wrote:

This is more of a reminder to myself...

BigJob is not working with the new redis server at TACC, but it is ping-able and works within SAGA. This COULD be an issue with the parsing happening in SAGA URL, but I have to diagnose what is really going on.

If anyone who works these tickets would like to help me diagnose, please ping me for the new password.

— Reply to this email directly or view it on GitHub.

oleweidner commented 11 years ago

I have created a ticket for this: https://github.com/saga-project/BigJob/issues/90

This is somewhat related to this ticket: https://github.com/saga-project/BigJob/issues/80

I think better Redis error handling will definitely be one of the "Top 5 Bugs" to address in the next release.

On Mar 29, 2013, at 04:25 , Andre Luckow notifications@github.com wrote:

Ok, now I see the issue:

the Redis server is not reachable from the Stampede compute nodes:

(python)tg804093@c402-002.stampede ~$ telnet redis01.tacc.utexas.edu 6379 Trying 129.114.60.146...

@Yaakoub: Can you have a look at this?

Thanks, Andre

On Thu, Mar 28, 2013 at 8:45 PM, Andre Luckow andre.luckow@gmail.com wrote:

It works on Stampede as well:

03/28/2013 02:42:48 PM - bigjob - INFO - Loading BigJob version: 0.4.128 on login1.stampede.tacc.utexas.edu 03/28/2013 02:42:53 PM - bigjob - INFO - Using SAGA Bliss. Start Pilot Job/BigJob at: fork://localhost 03/28/2013 02:42:53 PM - bigjob - DEBUG - Utilizing Redis Backend 03/28/2013 02:42:53 PM - bigjob - DEBUG - Parsing URL: redis://Oily9tourSorenavyvault@redis01.tacc.utexas.edu:6379 03/28/2013 02:42:55 PM - bigjob - DEBUG - redis:// redis01.tacc.utexas.edu 6379 03/28/2013 02:42:55 PM - bigjob - DEBUG - Connect to Redis: redis01.tacc.utexas.edu Port: 6379 03/28/2013 02:42:55 PM - bigjob - DEBUG - init BigJob w/: redis://Oily9tourSorenavyvault@redis01.tacc.utexas.edu:6379 03/28/2013 02:42:55 PM - bigjob - DEBUG - initialized BigJob: bigjob:bj-afbad9ba-97df-11e2-ab5e-d4ae52a0f02f 03/28/2013 02:42:55 PM - bigjob - DEBUG - create pilot job entry on backend server: bigjob:bj-afbad9ba-97df-11e2-ab5e-d4ae52a0f02f:localhost 03/28/2013 02:42:55 PM - bigjob - DEBUG - update state of pilot job to: Unknown stopped: False 03/28/2013 02:42:55 PM - bigjob - DEBUG - update description of pilot job to: None 03/28/2013 02:42:55 PM - bigjob - DEBUG - set pilot state to: Unknown 03/28/2013 02:42:55 PM - bigjob - DEBUG - setting walltime to: 10 03/28/2013 02:42:55 PM - bigjob - DEBUG - Use SSH backend 03/28/2013 02:42:56 PM - bigjob - DEBUG - ['/home1/01131/tg804093/src/BigJob/pilot/filemanagement/../../../webhdfs-py/', '/home1/01131/tg804093/src/BigJob/examples/../', '/home1/01131/tg804093/src/BigJob/examples', '/home1/01131/tg804093/.bigjob/python/lib/python2.7/site-packages/distribute-0.6.34-py2.7.egg', '/home1/01131/tg804093/.bigjob/python/lib/python2.7/site-packages/pip-1.2.1-py2.7.egg', '/home1/01131/tg804093/.bigjob/python/lib/python2.7/site-packages/pexpect-2.4-py2.7.egg', '/home1/01131/tg804093/.bigjob/python/lib/python2.7/site-packages/simplejson-2.0.9-py2.7-linux-x86_64.egg', '/home1/01131/tg804093/.bigjob/python/lib/python2.7/site-packages/boto-2.2.2-py2.7.egg', '/home1/01131/tg804093/.bigjob/python/lib/python2.7/site-packages/globusonline_transfer_api_client-0.10.13-py2.7.egg', '/home1/01131/tg804093/.bigjob/python/lib/python2.7/site-packages/google_api_python_client-1.0-py2.7.egg', '/home1/01131/tg804093/.bigjob/python/lib/python2.7/site-packages/bliss-0.2.7-py2.7.egg', '/home1/01131/tg804093/.bigjob/python/lib/python2.7/site-packages/redis-2.2.4-py2.7.egg', '/home1/01131/tg804093/.bigjob/python/lib/python2.7/site-packages/virtualenv-1.8.4-py2.7.egg', '/home1/01131/tg804093/.bigjob/python/lib/python2.7/site-packages/threadpool-1.2.7-py2.7.egg', '/home1/01131/tg804093/.bigjob/python/lib/python2.7/site-packages/uuid-1.30-py2.7.egg', '/home1/01131/tg804093/.bigjob/python/lib/python2.7/site-packages/python_gflags-2.0-py2.7.egg', '/home1/01131/tg804093/.bigjob/python/lib/python2.7/site-packages/httplib2-0.7.7-py2.7.egg', '/home1/01131/tg804093/.bigjob/python/lib/python2.7/site-packages/paramiko_on_pypi-1.7.6-py2.7.egg', '/home1/01131/tg804093/.bigjob/python/lib/python2.7/site-packages/pycrypto_on_pypi-2.3-py2.7-linux-x86_64.egg', '/home1/01131/tg804093/.bigjob/python/lib/python2.7/site-packages/python_hostlist-1.14-py2.7.egg', '/home1/01131/tg804093/.bigjob/python/lib/python2.7/site-packages/BigJob-0.4.128-py2.7.egg', '/opt/apps/python/epd/7.3.2/modules/lib/python', '/opt/apps/python/epd/7.3.2/lib', '/home1/01131/tg804093/.bigjob/python/lib/python27.zip', '/home1/01131/tg804093/.bigjob/python/lib/python2.7', '/home1/01131/tg804093/.bigjob/python/lib/python2.7/plat-linux2', '/home1/01131/tg804093/.bigjob/python/lib/python2.7/lib-tk', '/home1/01131/tg804093/.bigjob/python/lib/python2.7/lib-old', '/home1/01131/tg804093/.bigjob/python/lib/python2.7/lib-dynload', '/opt/apps/python/epd/7.3.2/lib/python2.7', '/opt/apps/python/epd/7.3.2/lib/python2.7/plat-linux2', '/opt/apps/python/epd/7.3.2/lib/python2.7/lib-tk', '/home1/01131/tg804093/.bigjob/python/lib/python2.7/site-packages', '/opt/apps/python/epd/7.3.2/lib/python2.7/site-packages', '/opt/apps/python/epd/7.3.2/lib/python2.7/site-packages/PIL', '/home1/01131/tg804093/src/BigJob/examples/../bigjob', '/home1/01131/tg804093/src/BigJob/examples/../pilot/impl/../..', '/home1/01131/tg804093/src/BigJob/examples/../pilot/filemanagement/../..'] 03/28/2013 02:42:56 PM - bigjob - WARNING - WebHDFS package not found. 03/28/2013 02:42:56 PM - bigjob - DEBUG - Security Context: None 03/28/2013 02:42:56 PM - bigjob - DEBUG - BigJob working directory: ssh://localhost//tmp/bj-afbad9ba-97df-11e2-ab5e-d4ae52a0f02f 03/28/2013 02:42:56 PM - bigjob - DEBUG - Create directory: //tmp/bj-afbad9ba-97df-11e2-ab5e-d4ae52a0f02f 03/28/2013 02:42:56 PM - bigjob - DEBUG - ssh -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no -o NumberOfPasswordPrompts=0 localhost mkdir //tmp/bj-afbad9ba-97df-11e2-ab5e-d4ae52a0f02f 03/28/2013 02:42:57 PM - bigjob - DEBUG - Run ssh -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no -o NumberOfPasswordPrompts=0 localhost mkdir //tmp/bj-afbad9ba-97df-11e2-ab5e-d4ae52a0f02f Output: ["Warning: Permanently added 'localhost' (RSA) to the list of known hosts.\r\r\n"] 03/28/2013 02:42:57 PM - bigjob - WARNING - No file staging adaptor found. 03/28/2013 02:42:57 PM - bigjob - DEBUG - BJ Working Directory: /tmp 03/28/2013 02:42:57 PM - bigjob - DEBUG - Adaptor specific modifications: fork 03/28/2013 02:42:57 PM - bigjob - DEBUG - Escape Bliss 03/28/2013 02:42:57 PM - bigjob - DEBUG - "import sys import os import urllib import sys import time start_time = time.time() home = os.environ.get(\"HOME\")

print \"Home: \" + home

if home==None: home = os.getcwd() BIGJOB_AGENT_DIR= os.path.join(home, \".bigjob\") if not os.path.exists(BIGJOB_AGENT_DIR): os.mkdir (BIGJOB_AGENT_DIR) BIGJOB_PYTHON_DIR=BIGJOB_AGENT_DIR+\"/python/\" if not os.path.exists(BIGJOB_PYTHON_DIR): os.mkdir(BIGJOB_PYTHON_DIR) BOOTSTRAP_URL=\"https://raw.github.com/saga-project/BigJob/master/bootstrap/bigjob-bootstrap.py\" BOOTSTRAP_FILE=BIGJOB_AGENT_DIR+\"/bigjob-bootstrap.py\"

ensure that BJ in .bigjob is upfront in sys.path

sys.path.insert(0, os.getcwd() + \"/../\") p = list() for i in sys.path: if i.find(\".bigjob/python\")>1: p.insert(0, i) for i in p: sys.path.insert(0, i) print \"Python path: \" + str(sys.path) print \"Python version: \" + str(sys.version_info) try: import saga except: print \"SAGA and SAGA Python Bindings not found.\"; try: import bigjob.bigjob_agent except: print \"BigJob not installed. Attempt to install it.\"; opener = urllib.FancyURLopener({}); opener.retrieve(BOOTSTRAP_URL, BOOTSTRAP_FILE); print \"Execute: \" + \"python \" + BOOTSTRAP_FILE + \" \" + BIGJOB_PYTHON_DIR os.system(\"/usr/bin/env\") try: os.system(\"python \" + BOOTSTRAP_FILE + \" \" + BIGJOB_PYTHON_DIR); activate_this = os.path.join(BIGJOB_PYTHON_DIR, \"bin/activate_this.py\"); execfile(activate_this, dict(file=activate_this)) except: print \"BJ installation failed. Trying system-level python (/usr/bin/python)\"; os.system(\"/usr/bin/python \" + BOOTSTRAP_FILE + \" \" + BIGJOB_PYTHON_DIR); activate_this = os.path.join(BIGJOB_PYTHON_DIR, \"bin/activate_this.py\"); execfile(activate_this, dict(file=activate_this))

try to import BJ once again

import bigjob.bigjob_agent

execute bj agent

args = list() args.append(\"bigjob_agent.py\") args.append(\"redis://Oily9tourSorenavyvault@redis01.tacc.utexas.edu:6379\") args.append(\"bigjob:bj-afbad9ba-97df-11e2-ab5e-d4ae52a0f02f:localhost\") args.append(\"\") print \"Bootstrap time: \" + str(time.time()-start_time) print \"Starting BigJob Agents with following args: \" + str(args) bigjob_agent = bigjob.bigjob_agent.bigjob_agent(args) " 03/28/2013 02:42:57 PM - bigjob - DEBUG - Working directory: /tmp Job Description: {'Executable' : '/usr/bin/env','WorkingDirectory' : '/tmp','SPMDVariation' : 'single','Queue' : 'normal','WallTimeLimit' : '10','Arguments' : '['python', '-c', '"import sys\nimport os\nimport urllib\nimport sys\nimport time\nstart_time = time.time()\nhome = os.environ.get(\"HOME\")\n#print \"Home: \" + home\nif home==None: home = os.getcwd()\nBIGJOB_AGENT_DIR= os.path.join(home, \".bigjob\")\nif not os.path.exists(BIGJOB_AGENT_DIR): os.mkdir (BIGJOB_AGENT_DIR)\nBIGJOB_PYTHON_DIR=BIGJOB_AGENT_DIR+\"/python/\"\nif not os.path.exists(BIGJOB_PYTHON_DIR): os.mkdir(BIGJOB_PYTHON_DIR)\nBOOTSTRAP_URL=\"https://raw.github.com/saga-project/BigJob/master/bootstrap/bigjob-bootstrap.py\\"\nBOOTSTRAP_FILE=BIGJOB_AGENT_DIR+\\"/bigjob-bootstrap.py\\"\n#ensure that BJ in .bigjob is upfront in sys.path\nsys.path.insert(0, os.getcwd() + \"/../\")\np = list()\nfor i in sys.path:\n if i.find(\".bigjob/python\")>1:\n p.insert(0, i)\nfor i in p: sys.path.insert(0, i)\nprint \"Python path: \" + str(sys.path)\nprint \"Python version: \" + str(sys.version_info)\ntry: import saga\nexcept: print \"SAGA and SAGA Python Bindings not found.\";\ntry: import bigjob.bigjob_agent\nexcept: \n print \"BigJob not installed. Attempt to install it.\"; \n opener = urllib.FancyURLopener({}); \n opener.retrieve(BOOTSTRAP_URL, BOOTSTRAP_FILE); \n print \"Execute: \" + \"python \" + BOOTSTRAP_FILE + \" \" + BIGJOB_PYTHON_DIR\n os.system(\"/usr/bin/env\")\n try:\n os.system(\"python \" + BOOTSTRAP_FILE + \" \" + BIGJOB_PYTHON_DIR); \n activate_this = os.path.join(BIGJOB_PYTHON_DIR, \"bin/activate_this.py\"); \n execfile(activate_this, dict(file=activate_this))\n except:\n print \"BJ installation failed. Trying system-level python (/usr/bin/python)\";\n os.system(\"/usr/bin/python \" + BOOTSTRAP_FILE + \" \" + BIGJOB_PYTHON_DIR); \n activate_this = os.path.join(BIGJOB_PYTHON_DIR, \"bin/activate_this.py\"); \n execfile(activate_this, dict(file=activate_this))\n#try to import BJ once again\nimport bigjob.bigjob_agent\n# execute bj agent\nargs = list()\nargs.append(\"bigjob_agent.py\")\nargs.append(\"redis://Oily9tourSorenavyvault@redis01.tacc.utexas.edu:6379\")\nargs.append(\"bigjob:bj-afbad9ba-97df-11e2-ab5e-d4ae52a0f02f:localhost\")\nargs.append(\"\")\nprint \"Bootstrap time: \" + str(time.time()-start_time)\nprint \"Starting BigJob Agents with following args: \" + str(args)\nbigjob_agent = bigjob.bigjob_agent.bigjob_agent(args)\n"']','TotalCPUCount' : '8',} 03/28/2013 02:42:57 PM - bigjob - DEBUG - Creating pilot job with description: {'Executable' : '/usr/bin/env','WorkingDirectory' : '/tmp','SPMDVariation' : 'single','Queue' : 'normal','WallTimeLimit' : '10','Arguments' : '['python', '-c', '"import sys\nimport os\nimport urllib\nimport sys\nimport time\nstart_time = time.time()\nhome = os.environ.get(\"HOME\")\n#print \"Home: \" + home\nif home==None: home = os.getcwd()\nBIGJOB_AGENT_DIR= os.path.join(home, \".bigjob\")\nif not os.path.exists(BIGJOB_AGENT_DIR): os.mkdir (BIGJOB_AGENT_DIR)\nBIGJOB_PYTHON_DIR=BIGJOB_AGENT_DIR+\"/python/\"\nif not os.path.exists(BIGJOB_PYTHON_DIR): os.mkdir(BIGJOB_PYTHON_DIR)\nBOOTSTRAP_URL=\"https://raw.github.com/saga-project/BigJob/master/bootstrap/bigjob-bootstrap.py\\"\nBOOTSTRAP_FILE=BIGJOB_AGENT_DIR+\\"/bigjob-bootstrap.py\\"\n#ensure that BJ in .bigjob is upfront in sys.path\nsys.path.insert(0, os.getcwd() + \"/../\")\np = list()\nfor i in sys.path:\n if i.find(\".bigjob/python\")>1:\n p.insert(0, i)\nfor i in p: sys.path.insert(0, i)\nprint \"Python path: \" + str(sys.path)\nprint \"Python version: \" + str(sys.version_info)\ntry: import saga\nexcept: print \"SAGA and SAGA Python Bindings not found.\";\ntry: import bigjob.bigjob_agent\nexcept: \n print \"BigJob not installed. Attempt to install it.\"; \n opener = urllib.FancyURLopener({}); \n opener.retrieve(BOOTSTRAP_URL, BOOTSTRAP_FILE); \n print \"Execute: \" + \"python \" + BOOTSTRAP_FILE + \" \" + BIGJOB_PYTHON_DIR\n os.system(\"/usr/bin/env\")\n try:\n os.system(\"python \" + BOOTSTRAP_FILE + \" \" + BIGJOB_PYTHON_DIR); \n activate_this = os.path.join(BIGJOB_PYTHON_DIR, \"bin/activate_this.py\"); \n execfile(activate_this, dict(file=activate_this))\n except:\n print \"BJ installation failed. Trying system-level python (/usr/bin/python)\";\n os.system(\"/usr/bin/python \" + BOOTSTRAP_FILE + \" \" + BIGJOB_PYTHON_DIR); \n activate_this = os.path.join(BIGJOB_PYTHON_DIR, \"bin/activate_this.py\"); \n execfile(activate_this, dict(file=activate_this))\n#try to import BJ once again\nimport bigjob.bigjob_agent\n# execute bj agent\nargs = list()\nargs.append(\"bigjob_agent.py\")\nargs.append(\"redis://Oily9tourSorenavyvault@redis01.tacc.utexas.edu:6379\")\nargs.append(\"bigjob:bj-afbad9ba-97df-11e2-ab5e-d4ae52a0f02f:localhost\")\nargs.append(\"\")\nprint \"Bootstrap time: \" + str(time.time()-start_time)\nprint \"Starting BigJob Agents with following args: \" + str(args)\nbigjob_agent = bigjob.bigjob_agent.bigjob_agent(args)\n"']','Error' : '/tmp/stderr-bj-afbad9ba-97df-11e2-ab5e-d4ae52a0f02f-agent.txt','Output' : '/tmp/stdout-bj-afbad9ba-97df-11e2-ab5e-d4ae52a0f02f-agent.txt','TotalCPUCount' : '8',} 03/28/2013 02:42:57 PM - bigjob - DEBUG - Submit pilot job to: fork://localhost Pilot Job/BigJob URL: bigjob:bj-afbad9ba-97df-11e2-ab5e-d4ae52a0f02f:localhost State: Unknown 03/28/2013 02:42:57 PM - bigjob - DEBUG - add subjob to queue of PJ: bigjob:bj-afbad9ba-97df-11e2-ab5e-d4ae52a0f02f:localhost 03/28/2013 02:42:57 PM - bigjob - DEBUG - create dictionary for job description. Job-URL: bigjob:bj-afbad9ba-97df-11e2-ab5e-d4ae52a0f02f:localhost:jobs:sj-b0e5aeaa-97df-11e2-ab5e-d4ae52a0f02f 03/28/2013 02:42:57 PM - bigjob - DEBUG - SJ Attributes: {'Executable' : '/bin/echo','Environment' : '['HELLOWORLD=hello_world']','Arguments' : '['$HELLOWORLD']','NumberOfProcesses' : '1','Error' : 'stderr.txt','Output' : 'stdout.txt',} 03/28/2013 02:42:57 PM - bigjob - DEBUG - job dict: {'Executable': '/bin/echo', 'NumberOfProcesses': 1, 'Environment': ['HELLOWORLD=hello_world'], 'state': 'Unknown', 'Arguments': ['$HELLOWORLD'], 'Error': 'stderr.txt', 'Output': 'stdout.txt', 'job-id': 'sj-b0e5aeaa-97df-11e2-ab5e-d4ae52a0f02f'} 03/28/2013 02:42:57 PM - bigjob - DEBUG - set job state to: Unknown 03/28/2013 02:42:57 PM - bigjob - DEBUG - Get subjob state: bigjob:bj-afbad9ba-97df-11e2-ab5e-d4ae52a0f02f:localhost:jobs:sj-b0e5aeaa-97df-11e2-ab5e-d4ae52a0f02f state: Unknown 03/28/2013 02:42:59 PM - bigjob - DEBUG - Get subjob state: bigjob:bj-afbad9ba-97df-11e2-ab5e-d4ae52a0f02f:localhost:jobs:sj-b0e5aeaa-97df-11e2-ab5e-d4ae52a0f02f state: Unknown 03/28/2013 02:43:01 PM - bigjob - DEBUG - Get subjob state: bigjob:bj-afbad9ba-97df-11e2-ab5e-d4ae52a0f02f:localhost:jobs:sj-b0e5aeaa-97df-11e2-ab5e-d4ae52a0f02f state: Running 03/28/2013 02:43:03 PM - bigjob - DEBUG - Get subjob state: bigjob:bj-afbad9ba-97df-11e2-ab5e-d4ae52a0f02f:localhost:jobs:sj-b0e5aeaa-97df-11e2-ab5e-d4ae52a0f02f state: Running 03/28/2013 02:43:05 PM - bigjob - DEBUG - Get subjob state: bigjob:bj-afbad9ba-97df-11e2-ab5e-d4ae52a0f02f:localhost:jobs:sj-b0e5aeaa-97df-11e2-ab5e-d4ae52a0f02f state: Running 03/28/2013 02:43:07 PM - bigjob - DEBUG - Get subjob state: bigjob:bj-afbad9ba-97df-11e2-ab5e-d4ae52a0f02f:localhost:jobs:sj-b0e5aeaa-97df-11e2-ab5e-d4ae52a0f02f state: Done 03/28/2013 02:43:07 PM - bigjob - DEBUG - Cancel Pilot Job 03/28/2013 02:43:07 PM - bigjob - DEBUG - stop pilot job: bigjob:bj-afbad9ba-97df-11e2-ab5e-d4ae52a0f02f:localhost 03/28/2013 02:43:07 PM - bigjob - DEBUG - update state of pilot job to: Done stopped: True 03/28/2013 02:43:07 PM - bigjob - DEBUG - delete pilot job: bigjob:bj-afbad9ba-97df-11e2-ab5e-d4ae52a0f02f:localhost 03/28/2013 02:43:07 PM - bigjob - DEBUG - Cancel Pilot Job finished

On Wed, Mar 27, 2013 at 2:17 PM, Melissa notifications@github.com wrote:

This is more of a reminder to myself...

BigJob is not working with the new redis server at TACC, but it is ping-able and works within SAGA. This COULD be an issue with the parsing happening in SAGA URL, but I have to diagnose what is really going on.

If anyone who works these tickets would like to help me diagnose, please ping me for the new password.

— Reply to this email directly or view it on GitHub. — Reply to this email directly or view it on GitHub.

ashleyz commented 11 years ago

This is working now. Thanks all!

ashleyz commented 11 years ago

It won't let me close this issue, so please feel free to :)