nchammas / flintrock

A command-line tool for launching Apache Spark clusters.
Apache License 2.0
636 stars 116 forks source link

Flintrock hangs on launch with SSH timeouts to the cluster instances #255

Closed Kiran-G1 closed 6 years ago

Kiran-G1 commented 6 years ago

I'm facing this SSH timeout and unable to launch 2018-07-02 21:54:58,170 - flintrock.ec2 - INFO - Launching 2 instances... 2018-07-02 21:55:12,619 - flintrock.ec2 - DEBUG - 2 instances not in state 'running': 'i-07c5d3cf5773ebc8b', 'i-0cda15923091a6e95', ... 2018-07-02 21:55:18,605 - flintrock.ec2 - DEBUG - 2 instances not in state 'running': 'i-07c5d3cf5773ebc8b', 'i-0cda15923091a6e95', ... 2018-07-02 21:55:25,552 - flintrock.ssh - DEBUG - [13.127.202.23] SSH timeout. 2018-07-02 21:55:25,552 - flintrock.ssh - DEBUG - [13.232.75.27] SSH timeout. 2018-07-02 21:55:30,656 - flintrock.ssh - DEBUG - [13.127.202.23] SSH exception: [Errno None] Unable to connect to port 22 on 13.127.202.23 2018-07-02 21:55:33,561 - flintrock.ssh - DEBUG - [13.232.75.27] SSH timeout. /usr/local/lib/python3.5/dist-packages/paramiko/rsakey.py:119: CryptographyDeprecationWarning: signer and verifier have been deprecated. Please use sign and verify instead. algorithm=hashes.SHA1(), /usr/local/lib/python3.5/dist-packages/paramiko/rsakey.py:99: CryptographyDeprecationWarning: signer and verifier have been deprecated. Please use sign and verify instead. algorithm=hashes.SHA1(), 2018-07-02 21:55:41,571 - flintrock.ssh - DEBUG - [13.232.75.27] SSH timeout. 2018-07-02 21:55:49,581 - flintrock.ssh - DEBUG - [13.232.75.27] SSH timeout. 2018-07-02 21:55:57,591 - flintrock.ssh - DEBUG - [13.232.75.27] SSH timeout. 2018-07-02 21:56:05,597 - flintrock.ssh - DEBUG - [13.232.75.27] SSH timeout. 2018-07-02 21:56:13,607 - flintrock.ssh - DEBUG - [13.232.75.27] SSH timeout. 2018-07-02 21:56:21,617 - flintrock.ssh - DEBUG - [13.232.75.27] SSH timeout. 2018-07-02 21:56:29,627 - flintrock.ssh - DEBUG - [13.232.75.27] SSH timeout.

nchammas commented 6 years ago

Some questions:

  1. What version of Flintrock are you running? You skipped that field in the issue template.
  2. How long do the timeouts last? Where is the rest of the output? It's normal to see timeouts while the instances come up.
  3. How did you install Flintrock?
Kiran-G1 commented 6 years ago

1.flintrock version: 0.9.0

  1. few seconds (8.08 to be precise)
  2. sudo pip3 install flintrock Here is the entire log, just now tried to launch few
    
    flintrock launch chandu
    2018-07-02 22:14:59,365 - flintrock.ec2       - INFO  - Launching 2 instances...
    2018-07-02 22:15:14,809 - flintrock.ec2       - DEBUG - 2 instances not in state 'running': 'i-047cc2ac0f2a6fa40', 'i-001065bf494b70010', ...
    2018-07-02 22:15:21,775 - flintrock.ec2       - DEBUG - 1 instances not in state 'running': 'i-047cc2ac0f2a6fa40', ...
    2018-07-02 22:15:26,376 - flintrock.ec2       - DEBUG - 1 instances not in state 'running': 'i-047cc2ac0f2a6fa40', ...
    2018-07-02 22:15:30,929 - flintrock.ec2       - DEBUG - 1 instances not in state 'running': 'i-047cc2ac0f2a6fa40', ...
    2018-07-02 22:15:38,070 - flintrock.ssh       - DEBUG - [13.126.0.175] SSH timeout.
    2018-07-02 22:15:38,071 - flintrock.ssh       - DEBUG - [13.232.147.50] SSH timeout.
    2018-07-02 22:15:46,081 - flintrock.ssh       - DEBUG - [13.232.147.50] SSH timeout.
    2018-07-02 22:15:46,081 - flintrock.ssh       - DEBUG - [13.126.0.175] SSH timeout.
    2018-07-02 22:15:54,083 - flintrock.ssh       - DEBUG - [13.126.0.175] SSH timeout.
    2018-07-02 22:15:54,087 - flintrock.ssh       - DEBUG - [13.232.147.50] SSH timeout.
    2018-07-02 22:16:02,093 - flintrock.ssh       - DEBUG - [13.126.0.175] SSH timeout.
    2018-07-02 22:16:02,095 - flintrock.ssh       - DEBUG - [13.232.147.50] SSH timeout.
    2018-07-02 22:16:10,099 - flintrock.ssh       - DEBUG - [13.126.0.175] SSH timeout.
    2018-07-02 22:16:10,105 - flintrock.ssh       - DEBUG - [13.232.147.50] SSH timeout.
    2018-07-02 22:16:18,109 - flintrock.ssh       - DEBUG - [13.126.0.175] SSH timeout.
    2018-07-02 22:16:18,111 - flintrock.ssh       - DEBUG - [13.232.147.50] SSH timeout.
    2018-07-02 22:16:26,119 - flintrock.ssh       - DEBUG - [13.126.0.175] SSH timeout.
    2018-07-02 22:16:26,119 - flintrock.ssh       - DEBUG - [13.232.147.50] SSH timeout.
nchammas commented 6 years ago

So after the last "SSH timeout" debug print line does Flintrock just hang?

Kiran-G1 commented 6 years ago

No. Flintrock continuously trying to reach. Thank you so much for replying Nick.

nchammas commented 6 years ago

Are you able to manually SSH from the same host where you are running Flintrock to any node in the same VPC where you are trying to launch a cluster (doesn't have to be Flintrock-related)?

Kiran-G1 commented 6 years ago

Yes, I can manually SSH into the nodes created by console. Also, I tried manually logging into the nodes created by Flintrock since they appeared running in the console. But, still facing timeout issue.

Kiran-G1 commented 6 years ago

and they're in the same VPC

nchammas commented 6 years ago

What do the security group rules look like for the Flintrock cluster you are trying to SSH into vs. the other non-Flintrock node that you are successfully able to SSH into?

The continuous timeouts suggest an issue with the security group rules, but if you can reach other instances in the same VPC then I'm not sure what the cause could be.

Kiran-G1 commented 6 years ago

They all under the same VPC-ID. Checking each and every line of SSH.py but was not able figure out anything

nchammas commented 6 years ago

I'm still looking for answer to this question:

What do the security group rules look like for the Flintrock cluster you are trying to SSH into vs. the other non-Flintrock node that you are successfully able to SSH into?

If it isn't clear, security group rules are assigned per instance, so being in the same VPC doesn't say anything about whether the inbound access rules are the same for two different instances.

Kiran-G1 commented 6 years ago

Issure resolved when I deleted all the security groups and VPCs.