saga-project / BigJob

SAGA-based Pilot-Job Implementation for Compute and Data
http://saga-project.github.com/BigJob/
Other
19 stars 8 forks source link

Amazon EC2 .pem cert usage issue in saga-bigjob #145

Closed anjaniragothaman closed 11 years ago

anjaniragothaman commented 11 years ago

The saga-bigjob version "0.4.134-152-g04b31d3-saga-python" has issues using .pem certificate for Amazon EC2 cloud resources. The debug log pasted below (replaced the passwords and keys with xxx in the log.)

(newbjenv)cctsg@cyder:~/anjani/scenarioTest$ python test_aws_sagabj.py 
07/26/2013 03:34:53 PM - bigjob - INFO - Loading BigJob version: 0.4.134-152-g04b31d3-saga-python on cyder.cct.lsu.edu
07/26/2013 03:34:53 PM - bigjob - DEBUG - ['/home/cctsg/newbjenv/lib/python2.7/site-packages/pilot/filemanagement/../../../webhdfs-py/', '/home/cctsg/anjani/scenarioTest', '/home/cctsg/newbjenv/lib/python2.7/site-packages/setuptools-0.6c11-py2.7.egg', '/home/cctsg/newbjenv/lib/python2.7/site-packages/pip-1.3.1-py2.7.egg', '/opt/macs/lib/python2.6/site-packages', '/share/apps/solidexport/lib/python', '/home/cctsg/newbjenv/lib/python27.zip', '/home/cctsg/newbjenv/lib/python2.7', '/home/cctsg/newbjenv/lib/python2.7/plat-linux2', '/home/cctsg/newbjenv/lib/python2.7/lib-tk', '/home/cctsg/newbjenv/lib/python2.7/lib-old', '/home/cctsg/newbjenv/lib/python2.7/lib-dynload', '/usr/local/lib/python2.7', '/usr/local/lib/python2.7/plat-linux2', '/usr/local/lib/python2.7/lib-tk', '/home/cctsg/newbjenv/lib/python2.7/site-packages', '/home/cctsg/newbjenv/lib/python2.7/site-packages/bigjob', '/home/cctsg/newbjenv/lib/python2.7/site-packages/pilot/impl/../..', '/home/cctsg/newbjenv/lib/python2.7/site-packages/pilot/filemanagement/../..']
07/26/2013 03:34:53 PM - bigjob - WARNING - WebHDFS package not found.
07/26/2013 03:34:53 PM - bigjob - DEBUG - Created Pilot Compute Service: redis://xxx:6379/pcs/pcs-d3acb63e-f632-11e2-9d2d-d8d385abb2b0
07/26/2013 03:34:53 PM - bigjob - DEBUG - start bigjob at: ec2+ssh://aws.amazon.com
07/26/2013 03:34:53 PM - bigjob - DEBUG - Utilizing Redis Backend
07/26/2013 03:34:53 PM - bigjob - DEBUG - Parsing URL: redis://xxx:6379
07/26/2013 03:34:53 PM - bigjob - DEBUG - redis:// gw68xxx 6379
07/26/2013 03:34:53 PM - bigjob - DEBUG - Connect to Redis: gw68xxx Port: 6379
07/26/2013 03:34:53 PM - bigjob - DEBUG - init BigJob w/: redis://xxx:6379
07/26/2013 03:34:53 PM - bigjob - DEBUG - initialized BigJob: bigjob:bj-d3d8bbc6-f632-11e2-9d2d-d8d385abb2b0
07/26/2013 03:34:53 PM - bigjob - DEBUG - create pilot job entry on backend server: bigjob:bj-d3d8bbc6-f632-11e2-9d2d-d8d385abb2b0:aws.amazon.com
07/26/2013 03:34:53 PM - bigjob - DEBUG - update state of pilot job to: Unknown stopped: False
07/26/2013 03:34:53 PM - bigjob - DEBUG - update description of pilot job to: {'vm_ssh_username': 'ubuntu', 'vm_ssh_keyfile': '/home/cctsg/.ssh/ethread.pem', 'service_url': 'ec2+ssh://aws.amazon.com', 'vm_type': 't1.micro', 'vm_ssh_keyname': 'ethread', 'vm_id': 'ami-7b58cc12', 'region': 'us-east-1a', 'access_key_id': 'xxx', 'number_of_processes': 2, 'secret_access_key': 'xxx'}
07/26/2013 03:34:53 PM - bigjob - DEBUG - set pilot state to: Unknown
07/26/2013 03:34:53 PM - bigjob - DEBUG - setting walltime to: 3600
07/26/2013 03:34:53 PM - bigjob - DEBUG - File Staging for Cloud Instances currently not supported.
07/26/2013 03:34:53 PM - bigjob - DEBUG - Adaptor specific modifications: ec2+ssh
07/26/2013 03:34:53 PM - bigjob - DEBUG - Escape PBS
07/26/2013 03:34:53 PM - bigjob - DEBUG - 'import sys
import os
import urllib
import sys
import time
start_time = time.time()
home = os.environ.get("HOME")
#print "Home: " + home
if home==None: home = os.getcwd()
BIGJOB_AGENT_DIR= os.path.join(home, ".bigjob")
if not os.path.exists(BIGJOB_AGENT_DIR): os.mkdir (BIGJOB_AGENT_DIR)
BIGJOB_PYTHON_DIR=BIGJOB_AGENT_DIR+"/python/"
if not os.path.exists(BIGJOB_PYTHON_DIR): os.mkdir(BIGJOB_PYTHON_DIR)
BOOTSTRAP_URL="https://raw.github.com/saga-project/BigJob/master/bootstrap/bigjob-bootstrap.py"
BOOTSTRAP_FILE=BIGJOB_AGENT_DIR+"/bigjob-bootstrap.py"
#ensure that BJ in .bigjob is upfront in sys.path
sys.path.insert(0, os.getcwd() + "/../")
p = list()
for i in sys.path:
    if i.find(".bigjob/python")>1:
          p.insert(0, i)
for i in p: sys.path.insert(0, i)
print "Python path: " + str(sys.path)
print "Python version: " + str(sys.version_info)
try: import saga
except: print "SAGA not found.";
try: import bigjob.bigjob_agent
except: 
    print "BigJob not installed. Attempt to install it."; 
    opener = urllib.FancyURLopener({}); 
    opener.retrieve(BOOTSTRAP_URL, BOOTSTRAP_FILE); 
    print "Execute: " + "python " + BOOTSTRAP_FILE + " " + BIGJOB_PYTHON_DIR
    os.system("/usr/bin/env")
    try:
        os.system("python " + BOOTSTRAP_FILE + " " + BIGJOB_PYTHON_DIR); 
        activate_this = os.path.join(BIGJOB_PYTHON_DIR, "bin/activate_this.py"); 
        execfile(activate_this, dict(__file__=activate_this))
    except:
        print "BJ installation failed. Trying system-level python (/usr/bin/python)";
        os.system("/usr/bin/python " + BOOTSTRAP_FILE + " " + BIGJOB_PYTHON_DIR); 
        activate_this = os.path.join(BIGJOB_PYTHON_DIR, "bin/activate_this.py"); 
        execfile(activate_this, dict(__file__=activate_this))
#try to import BJ once again
import bigjob.bigjob_agent
# execute bj agent
args = list()
args.append("bigjob_agent.py")
args.append("redis://xxx:6379")
args.append("bigjob:bj-d3d8bbc6-f632-11e2-9d2d-d8d385abb2b0:aws.amazon.com")
args.append("PilotComputeServiceQueue-pcs-d3acb63e-f632-11e2-9d2d-d8d385abb2b0")
print "Bootstrap time: " + str(time.time()-start_time)
print "Starting BigJob Agents with following args: " + str(args)
bigjob_agent = bigjob.bigjob_agent.bigjob_agent(args)
'
07/26/2013 03:34:53 PM - bigjob - DEBUG - Working directory:  Job Description: <class 'saga.job.description.Description'> <bound method Description.as_dict of <saga.job.description.Description object at 0x17e0250>>
07/26/2013 03:34:53 PM - bigjob - DEBUG - Creating pilot job with description: <class 'saga.job.description.Description'> <bound method Description.as_dict of <saga.job.description.Description object at 0x17e0250>>
07/26/2013 03:34:53 PM - bigjob - DEBUG - URL: ec2+ssh://aws.amazon.com Type: <class 'saga.url.Url'>
07/26/2013 03:34:53 PM - bigjob - DEBUG - Connect to region: us-east-1a
07/26/2013 03:34:54 PM - bigjob - DEBUG - Trying to submit pilot job to: ec2+ssh://aws.amazon.com
07/26/2013 03:34:55 PM - bigjob - DEBUG - Started EC2/Eucalyptus/Nova instance: i-af6f77c4
07/26/2013 03:35:15 PM - bigjob - DEBUG - Connect to: ssh://54.227.132.8
AN ERROR OCCURED: BadParameter: Cannot add context, initialization failed (BadParameter: ssh public  key inaccessible: /home/cctsg/.ssh/ethread.pem.pub)
Traceback (most recent call last):
  File "test_aws_sagabj.py", line 58, in main
    pilotjob1 = pilot_compute_service1.create_pilot(pilot_compute_description_amazon_west1)
  File "/home/cctsg/newbjenv/lib/python2.7/site-packages/pilot/impl/pilotcompute_manager.py", line 78, in create_pilot
    bj = self.__start_bigjob(bj_dict)
  File "/home/cctsg/newbjenv/lib/python2.7/site-packages/pilot/impl/pilotcompute_manager.py", line 184, in __start_bigjob
    pilot_compute_description = bj_dict["pilot_compute_description"]
  File "/home/cctsg/newbjenv/lib/python2.7/site-packages/bigjob/bigjob_manager.py", line 376, in start_pilot_job
    self.job.run()
  File "/home/cctsg/newbjenv/lib/python2.7/site-packages/bigjob/job_plugin/ec2ssh.py", line 179, in run
    session.add_context(ctx)
  File "/home/cctsg/newbjenv/lib/python2.7/site-packages/saga/utils/signatures.py", line 451, in takes_invocation_proxy
    return method(*pargs, **pkwargs)
  File "/home/cctsg/newbjenv/lib/python2.7/site-packages/saga/utils/signatures.py", line 481, in returns_invocation_proxy
    result = method (*args, **kwargs)
  File "/home/cctsg/newbjenv/lib/python2.7/site-packages/saga/session.py", line 203, in add_context
    return self.contexts.append (ctx)
  File "/home/cctsg/newbjenv/lib/python2.7/site-packages/saga/session.py", line 60, in append
    raise saga.exceptions.BadParameter (msg)
BadParameter: BadParameter: Cannot add context, initialization failed (BadParameter: ssh public  key inaccessible: /home/cctsg/.ssh/ethread.pem.pub)
Terminating BigJob...
Traceback (most recent call last):
  File "test_aws_sagabj.py", line 161, in <module>
    sys.exit(main())
  File "test_aws_sagabj.py", line 154, in main
    pilotjob1.cancel()
UnboundLocalError: local variable 'pilotjob1' referenced before assignment
andre-merzky commented 11 years ago

The PEM access issue is a SAGA-Python problem (see https://github.com/saga-project/saga-python/issues/192) -- but to me it seems that the application is not catching the error correctly, and is accessing pilotjob1 after the exception? Or is that a problem within Bigjob (exception no passed through / translated)?

oleweidner commented 11 years ago

This has been fixed.