nchammas / flintrock

A command-line tool for launching Apache Spark clusters.
Apache License 2.0
636 stars 116 forks source link

SSH timeout and [Errno None] Unable to connect to port 22 on AWS EC2 #352

Closed satendrakumar closed 2 years ago

satendrakumar commented 2 years ago

I am launching cluster on AWS ec2 node under publich subnet. Here is my config:

services:
  spark:
    version: 3.1.2
    # git-commit: latest  # if not 'latest', provide a full commit SHA; e.g. d6dc12ef0146ae409834c78737c116050961f350
    # git-repository:  # optional; defaults to https://github.com/apache/spark
    # optional; defaults to download from a dynamically selected Apache mirror
    #   - can be http, https, or s3 URL
    #   - must contain a {v} template corresponding to the version
    #   - Spark must be pre-built
    #   - files must be named according to the release pattern shown here: https://dist.apache.org/repos/dist/release/spark/
    # download-source: "https://www.example.com/files/spark/{v}/"
    # download-source: "s3://some-bucket/spark/{v}/"
    # executor-instances: 1
  hdfs:
    version: 3.3.0
    # optional; defaults to download from a dynamically selected Apache mirror
    #   - can be http, https, or s3 URL
    #   - must contain a {v} template corresponding to the version
    #   - files must be named according to the release pattern shown here: https://dist.apache.org/repos/dist/release/hadoop/common/
    # download-source: "https://www.example.com/files/hadoop/{v}/"
    # download-source: "http://www-us.apache.org/dist/hadoop/common/hadoop-{v}/"
    # download-source: "s3://some-bucket/hadoop/{v}/"

provider: ec2

providers:
  ec2:
    key-name: mypemfile
    identity-file: /home/ubuntu/mypemfile.pem
    instance-type: t2.small
    region: us-east-1
    availability-zone: us-east-1a
    ami: ami-06eecef118bbf9259  # Amazon Linux 2, us-east-1
    user: ec2-user
    # ami: ami-61bbf104  # CentOS 7, us-east-1
    # user: centos
    # spot-price: <price>
    # spot-request-duration: 7d  # duration a spot request is valid, supports d/h/m/s (e.g. 4d 3h 2m 1s)
    vpc-id: vpc-ABC
    subnet-id: subnet-public
    # placement-group: <name>
    #security-groups:
     # - test-spark-cluster

    #instance-profile-name: S3-access
    # tags:
    #   - key1,value1
    #   - key2, value2  # leading/trailing spaces are trimmed
    #   - key3,  # value will be empty
    #min-root-ebs-size-gb: 256
    tenancy: default  # default | dedicated
    ebs-optimized: no  # yes | no
    instance-initiated-shutdown-behavior: terminate  # terminate | stop
    # user-data: /path/to/userdata/script
    # authorize-access-from:
    #   - 10.0.0.42/32
    #   - sg-xyz4654564xyz

launch:
  num-slaves: 1
  # install-hdfs: True
  # install-spark: False
  java-version: 8
debug: true

SSH is working from my local machine. I tested using same pem file but cluster setup is failin with timeout.

(env) ubuntu@ip-172-31-83-133:~$ flintrock launch test-spark-cluster
/home/ubuntu/env/lib/python3.10/site-packages/paramiko/transport.py:219: CryptographyDeprecationWarning: Blowfish has been deprecated
  "class": algorithms.Blowfish,
2022-05-24 12:53:00,560 - flintrock.flintrock - WARNING - Warning: Downloading Spark from an Apache mirror. Apache mirrors are often slow and unreliable, and typically only serve the most recent releases. We strongly recommend you specify a custom download source. For more background on this issue, please see: https://github.com/nchammas/flintrock/issues/238
2022-05-24 12:53:04,359 - flintrock.ec2       - INFO  - Launching 2 instances...
2022-05-24 12:53:17,985 - flintrock.ec2       - DEBUG - 2 instances not in state 'running': 'i-0f38ebb3fb87ae1', 'i-0c09a264092', ...
2022-05-24 12:53:21,207 - flintrock.ec2       - DEBUG - 2 instances not in state 'running': 'i-0f38ebb3d7ae1', 'i-0c09abd8092', ...
2022-05-24 12:53:24,268 - flintrock.ec2       - DEBUG - 2 instances not in state 'running': 'i-0f38e3fb87d7ae1', 'i-0c0947bd8092', ...
2022-05-24 12:53:27,331 - flintrock.ec2       - DEBUG - 2 instances not in state 'running': 'i-0f38e3fb87d7ae1', 'i-0c09a2617bd8092', ...
2022-05-24 12:53:30,402 - flintrock.ec2       - DEBUG - 2 instances not in state 'running': 'i-0f38eb87d7ae1', 'i-0c09d8092', ...
2022-05-24 12:53:33,467 - flintrock.ec2       - DEBUG - 2 instances not in state 'running': 'i-0f3887d7ae1', 'i-0c098092', ...
2022-05-24 12:53:36,531 - flintrock.ec2       - DEBUG - 2 instances not in state 'running': 'i-0f387d7ae1', 'i-0c0bd8092', ...
2022-05-24 12:53:39,658 - flintrock.ec2       - DEBUG - 1 instances not in state 'running': 'i-0c09d8092', ...
2022-05-24 12:53:42,725 - flintrock.ec2       - DEBUG - 1 instances not in state 'running': 'i-0c09bd8092', ...
2022-05-24 12:53:49,163 - flintrock.ssh       - DEBUG - [52.201.***.**] SSH timeout.
2022-05-24 12:53:49,163 - flintrock.ssh       - DEBUG - [54.88.**.***] SSH timeout.
2022-05-24 12:53:54,170 - flintrock.ssh       - DEBUG - [54.88.**.***] SSH exception: [Errno None] Unable to connect to port 22 on 54.88.**.***
2022-05-24 12:53:57,172 - flintrock.ssh       - DEBUG - [52.2**.2***] SSH timeout.
2022-05-24 12:54:02,180 - flintrock.ssh       - DEBUG - [52.201.***.***] SSH exception: [Errno None] Unable to connect to port 22 on 52.201.***.***
nchammas commented 2 years ago

SSH timeouts are normal for a brief period while the instances are still coming online.

If you just wait a few minutes, do the timeouts persist or does the setup eventually continue?

satendrakumar commented 2 years ago

@nchammas thanks for replay.

After timeout. it is throwing Error:

2022-05-24 20:54:47,131 - flintrock.ssh       - DEBUG - [52.201.***.***] SSH AuthenticationException.
2022-05-24 20:54:54,041 - flintrock.ssh       - DEBUG - [52.201.***.***] SSH AuthenticationException.
2022-05-24 20:55:01,017 - flintrock.ssh       - DEBUG - [52.201.***.***]  SSH AuthenticationException.
2022-05-24 20:55:07,881 - flintrock.ssh       - DEBUG - [52.201.***.***]SSH AuthenticationException.
2022-05-24 20:55:14,781 - flintrock.ssh       - DEBUG - [52.201.***.***] SSH AuthenticationException.
Exception: Error reading SSH protocol banner
Traceback (most recent call last):
  File "/home/satendra/decooda/NH-AWS/env/lib/python3.8/site-packages/paramiko/transport.py", line 2211, in _check_banner
    buf = self.packetizer.readline(timeout)
  File "/home/satendra/decooda/NH-AWS/env/lib/python3.8/site-packages/paramiko/packet.py", line 380, in readline
    buf += self._read_timeout(timeout)
  File "/home/satendra/decooda/NH-AWS/env/lib/python3.8/site-packages/paramiko/packet.py", line 609, in _read_timeout
    raise EOFError()
EOFError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/satendra/decooda/NH-AWS/env/lib/python3.8/site-packages/paramiko/transport.py", line 2039, in run
    self._check_banner()
  File "/home/satendra/decooda/NH-AWS/env/lib/python3.8/site-packages/paramiko/transport.py", line 2215, in _check_banner
    raise SSHException(
paramiko.ssh_exception.SSHException: Error reading SSH protocol banner

Do you want to terminate the 2 instances created by this operation? [Y/n]: Y
Terminating instances...
52.201.***.***] SSH protocol error. Possible causes include using the wrong key file or username.
nchammas commented 2 years ago

SSH is working from my local machine. I tested using same pem file but cluster setup is failin with timeout.

You used the same pem file in your SSH test, but did you use the same username as in your Flintrock config?

satendrakumar commented 2 years ago

@nchammas yes, User name was same ec2-user

nchammas commented 2 years ago

That's weird. Can you show me all the properties of the instance you are able to SSH into? Are the subnet, security group, VPC, and AMI all the same as in your Flintrock config? Please share also the full SSH command that is working for you.

satendrakumar commented 2 years ago

@nchammas I found the issue. I was using key name in upper camel case(SparkStack.pem). That was not working with flintrock. It works for SSH.

I created new key pair called spark_stack.pem. It is working. Thank you so much for help.