saga-project / BigJob

SAGA-based Pilot-Job Implementation for Compute and Data
http://saga-project.github.com/BigJob/
Other
19 stars 8 forks source link

Amazon EC2 ssh keyfile auth times out before VM boots up in saga-bigjob #146

Closed anjaniragothaman closed 10 years ago

anjaniragothaman commented 10 years ago

When I tried to replace the .pem file with the ssh key "id_rsa", I'm getting connection timeout issue. I'm not sure why this is happening. I'm attaching the error message for reference (replaced passwords and keys with xxx).

(newbjenv)cctsg@cyder:~/anjani/scenarioTest$ python test_aws_sagabj.py
07/26/2013 03:51:32 PM - bigjob - INFO - Loading BigJob version: 0.4.134-152-g04b31d3-saga-python on cyder.cct.lsu.edu
07/26/2013 03:51:32 PM - bigjob - DEBUG - ['/home/cctsg/newbjenv/lib/python2.7/site-packages/pilot/filemanagement/../../../webhdfs-py/', '/home/cctsg/anjani/scenarioTest', '/home/cctsg/newbjenv/lib/python2.7/site-packages/setuptools-0.6c11-py2.7.egg', '/home/cctsg/newbjenv/lib/python2.7/site-packages/pip-1.3.1-py2.7.egg', '/opt/macs/lib/python2.6/site-packages', '/share/apps/solidexport/lib/python', '/home/cctsg/newbjenv/lib/python27.zip', '/home/cctsg/newbjenv/lib/python2.7', '/home/cctsg/newbjenv/lib/python2.7/plat-linux2', '/home/cctsg/newbjenv/lib/python2.7/lib-tk', '/home/cctsg/newbjenv/lib/python2.7/lib-old', '/home/cctsg/newbjenv/lib/python2.7/lib-dynload', '/usr/local/lib/python2.7', '/usr/local/lib/python2.7/plat-linux2', '/usr/local/lib/python2.7/lib-tk', '/home/cctsg/newbjenv/lib/python2.7/site-packages', '/home/cctsg/newbjenv/lib/python2.7/site-packages/bigjob', '/home/cctsg/newbjenv/lib/python2.7/site-packages/pilot/impl/../..', '/home/cctsg/newbjenv/lib/python2.7/site-packages/pilot/filemanagement/../..']
07/26/2013 03:51:32 PM - bigjob - WARNING - WebHDFS package not found.
07/26/2013 03:51:32 PM - bigjob - DEBUG - Created Pilot Compute Service: redis://xxx:6379/pcs/pcs-27481b42-f635-11e2-aa7e-d8d385abb2b0
07/26/2013 03:51:32 PM - bigjob - DEBUG - start bigjob at: ec2+ssh://aws.amazon.com
07/26/2013 03:51:32 PM - bigjob - DEBUG - Utilizing Redis Backend
07/26/2013 03:51:32 PM - bigjob - DEBUG - Parsing URL: redis://xxx:6379
07/26/2013 03:51:32 PM - bigjob - DEBUG - redis:// xxx 6379
07/26/2013 03:51:32 PM - bigjob - DEBUG - Connect to Redis: xxx Port: 6379
07/26/2013 03:51:32 PM - bigjob - DEBUG - init BigJob w/: redis://xxx:6379
07/26/2013 03:51:32 PM - bigjob - DEBUG - initialized BigJob: bigjob:bj-275ca788-f635-11e2-aa7e-d8d385abb2b0
07/26/2013 03:51:32 PM - bigjob - DEBUG - create pilot job entry on backend server: bigjob:bj-275ca788-f635-11e2-aa7e-d8d385abb2b0:aws.amazon.com
07/26/2013 03:51:32 PM - bigjob - DEBUG - update state of pilot job to: Unknown stopped: False
07/26/2013 03:51:32 PM - bigjob - DEBUG - update description of pilot job to: {'vm_ssh_username': 'ubuntu', 'vm_ssh_keyfile': '/home/cctsg/.ssh/id_rsa', 'service_url': 'ec2+ssh://aws.amazon.com', 'vm_type': 't1.micro', 'vm_ssh_keyname': 'ethread', 'vm_id': 'ami-7b58cc12', 'region': 'us-east-1a', 'access_key_id': 'xxx', 'number_of_processes': 2, 'secret_access_key': 'xxx'}
07/26/2013 03:51:32 PM - bigjob - DEBUG - set pilot state to: Unknown
07/26/2013 03:51:32 PM - bigjob - DEBUG - setting walltime to: 3600
07/26/2013 03:51:32 PM - bigjob - DEBUG - File Staging for Cloud Instances currently not supported.
07/26/2013 03:51:32 PM - bigjob - DEBUG - Adaptor specific modifications: ec2+ssh
07/26/2013 03:51:32 PM - bigjob - DEBUG - Escape PBS
07/26/2013 03:51:32 PM - bigjob - DEBUG - 'import sys
args.append("bigjob:bj-275ca788-f635-11e2-aa7e-d8d385abb2b0:aws.amazon.com")
args.append("PilotComputeServiceQueue-pcs-27481b42-f635-11e2-aa7e-d8d385abb2b0")

print "Bootstrap time: " + str(time.time()-start_time)
print "Starting BigJob Agents with following args: " + str(args)
bigjob_agent = bigjob.bigjob_agent.bigjob_agent(args)
'
07/26/2013 03:51:32 PM - bigjob - DEBUG - Working directory:  Job Description: <class 'saga.job.description.Description'> <bound method Description.as_dict of <saga.job.description.Description object at 0x2e4d2d0>>
07/26/2013 03:51:32 PM - bigjob - DEBUG - Creating pilot job with description: <class 'saga.job.description.Description'> <bound method Description.as_dict of <saga.job.description.Description object at 0x2e4d2d0>>
07/26/2013 03:51:32 PM - bigjob - DEBUG - URL: ec2+ssh://aws.amazon.com Type: <class 'saga.url.Url'>
07/26/2013 03:51:32 PM - bigjob - DEBUG - Connect to region: us-east-1a
07/26/2013 03:51:33 PM - bigjob - DEBUG - Trying to submit pilot job to: ec2+ssh://aws.amazon.com
07/26/2013 03:51:34 PM - bigjob - DEBUG - Started EC2/Eucalyptus/Nova instance: i-508c1a3e
07/26/2013 03:52:00 PM - bigjob - DEBUG - Connect to: ssh://54.227.91.15
AN ERROR OCCURED: BadParameter: Could not contact host 'ssh://54.227.91.15': [Errno 110] Connection timed out

Traceback (most recent call last):
  File "test_aws_sagabj.py", line 58, in main
    pilotjob1 = pilot_compute_service1.create_pilot(pilot_compute_description_amazon_west1)
  File "/home/cctsg/newbjenv/lib/python2.7/site-packages/pilot/impl/pilotcompute_manager.py", line 78, in create_pilot
    bj = self.__start_bigjob(bj_dict)
  File "/home/cctsg/newbjenv/lib/python2.7/site-packages/pilot/impl/pilotcompute_manager.py", line 184, in __start_bigjob
    pilot_compute_description = bj_dict["pilot_compute_description"]
  File "/home/cctsg/newbjenv/lib/python2.7/site-packages/bigjob/bigjob_manager.py", line 376, in start_pilot_job
    self.job.run()
  File "/home/cctsg/newbjenv/lib/python2.7/site-packages/bigjob/job_plugin/ec2ssh.py", line 181, in run
    js = saga.job.Service(url, session=session)

  File "/home/cctsg/newbjenv/lib/python2.7/site-packages/saga/utils/signatures.py", line 451, in takes_invocation_proxy
    return method(*pargs, **pkwargs)
  File "/home/cctsg/newbjenv/lib/python2.7/site-packages/saga/utils/signatures.py", line 481, in returns_invocation_proxy
    result = method (*args, **kwargs)
  File "/home/cctsg/newbjenv/lib/python2.7/site-packages/saga/job/service.py", line 90, in __init__
    url, session, ttype=_ttype)

  File "/home/cctsg/newbjenv/lib/python2.7/site-packages/saga/utils/signatures.py", line 451, in takes_invocation_proxy
    return method(*pargs, **pkwargs)
  File "/home/cctsg/newbjenv/lib/python2.7/site-packages/saga/utils/signatures.py", line 481, in returns_invocation_proxy
    result = method (*args, **kwargs)
  File "/home/cctsg/newbjenv/lib/python2.7/site-packages/saga/base.py", line 97, in __init__
    self._init_task = self._adaptor.init_instance (adaptor_state, *args, **kwargs)
  File "/home/cctsg/newbjenv/lib/python2.7/site-packages/saga/adaptors/cpi/decorators.py", line 51, in wrap_function
    return sync_function (self, *args, **kwargs)
  File "/home/cctsg/newbjenv/lib/python2.7/site-packages/saga/adaptors/shell/shell_job.py", line 337, in init_instance
    self._logger, opts=self.opts)
  File "/home/cctsg/newbjenv/lib/python2.7/site-packages/saga/utils/pty_shell.py", line 204, in __init__
    self.pty_info   = self.factory.initialize (url, session, self.logger)
  File "/home/cctsg/newbjenv/lib/python2.7/site-packages/saga/utils/pty_shell_factory.py", line 155, in initialize
    info = self._create_master_entry (url, session, logger)
  File "/home/cctsg/newbjenv/lib/python2.7/site-packages/saga/utils/pty_shell_factory.py", line 460, in _create_master_entry
    % (url, e))
BadParameter: BadParameter: Could not contact host 'ssh://54.227.91.15': [Errno 110] Connection timed out
anjaniragothaman commented 10 years ago

Sourse script used: https://gist.github.com/anjaniragothaman/09d7e1b99cc25baacb2a