whilenull / 7777-support

Documentation and support for 7777.
https://port7777.com
54 stars 3 forks source link

The connection to the SSH tunnel timed out. Please reinstall 7777 or check your Security Group. #20

Open argutierrez00 opened 3 years ago

argutierrez00 commented 3 years ago

Hi guys,

I keep encountering the ff: error:

The connection to the SSH tunnel timed out. Please reinstall 7777 or check your Security Group.

➜  ~ 7777 --region=<region> --verbose
Using the AWS region <region>.
Validating the 7777 license.
Generating unique RSA keys for the SSH tunnel.
Checking if the port 7777 is available.
Port 7777 selected.
Listing databases.
Only one database found in this region, automatically selecting <dbname>.
Retrieving the availability zone and subnet of your instance.
Checking if 7777 is set up in the AWS account.
Setting up 7777 in the AWS account, this usually takes 30 seconds.
7777 is now set up, let's connect to the database.
Retrieving the container security group.
Retrieving the computer's IP address.
Authorizing <my_ip> on the security group.
Starting the Fargate container.
The IP address of the container is <container_ip>.
Starting the SSH tunnel to <container_ip>.
The connection to the SSH tunnel timed out. Please reinstall 7777 or check your Security Group.

Any additional help is aprpeciated.

mnapoli commented 3 years ago

Hi! Thanks for the detailed report, just a question to be sure:

Starting the SSH tunnel to <container_ip>.
The connection to the SSH tunnel timed out. Please reinstall 7777 or check your Security Group.

Is that instantaneous? Or is the tunnel working for a few minutes before the error happens?

Questions on top of my head: are you running with any kind of VPN or proxy? (I'm wondering about whether the authorized IP address is correct) But then what is surprising is that the tunnel seems to happen, before it fails with a "timeout".

Could you also confirm the version: 7777 --version

argutierrez00 commented 3 years ago

Is that instantaneous? Or is the tunnel working for a few minutes before the error happens?

➜ ~ 7777 --version 7777/1.0.7 linux-x64 node-v14.4.0

deleugpn commented 3 years ago

Could you check that your IP address was successfully added to the Security Group of the Bastion? By going to: CloudFormation > port7777 -> Resources -> ContainerSecurityGroupFor{VpcId}, you'll be able to click on a link that takes you to the EC2 Security Group details. Can you confirm that your IP Address is present?

Have you checked that your RDS is receiving a new Security Group attachment to allow the container to connect to it?

Also, could you confirm that you don't have any NACL at the VPC level (route tables) REJECTING port 22 connections?

When you say uninstalling 7777 is failing, can you provide the reason why it fails (CloudFormation stack delete error?) Perhaps understanding why it's not being able to tear down could indicate something that went wrong on the setup.

argutierrez00 commented 3 years ago

Could you check that your IP address was successfully added to the Security Group of the Bastion? By going to: CloudFormation > port7777 -> Resources -> ContainerSecurityGroupFor{VpcId}, you'll be able to click on a link that takes you to the EC2 Security Group details. Can you confirm that your IP Address is present?

Have you checked that your RDS is receiving a new Security Group attachment to allow the container to connect to it?

All traffic All All sg- / 7777-container-security-group-vpc-
MYSQL/Aurora TCP 3306 sg- / app_prod

Also, could you confirm that you don't have any NACL at the VPC level (route tables) REJECTING port 22 connections?

When you say uninstalling 7777 is failing, can you provide the reason why it fails (CloudFormation stack delete error?) Perhaps understanding why it's not being able to tear down could indicate something that went wrong on the setup.

2021-04-16 20:07:31 UTC+0800 7777Cluster DELETE_FAILED Resource handler returned message: "Error occurred during operation 'DeleteClusters SDK Error: The Cluster cannot be deleted while Tasks are active. (Service: AmazonECS; Status Code: 400; Error Code: ClusterContainsTasksException; Request ID: ; Proxy: null)'." (RequestToken: , HandlerErrorCode: GeneralServiceException)

On the ECS side, the 7777ClusterUpdate cluster is still running with 4 tasks:

Task Definition: [INACTIVE] 7777-bastion:3 Last Status: Running

I usually stop those running tasks, the delete the cluster, then delete the CF stack again for the cleanup.

deleugpn commented 3 years ago

The failure to uninstall can be avoided by issuing 7777 stop (https://github.com/whilenull/7777-support/blob/main/commands.md#7777-stop) before running 7777 uninstall. Unfortunately it doesn't seem related to the connectivity issue.

You mentioned your RDS is running on a private subnet. Does your Route Table have a route for target local associated with all your private subnets? Does your RDS security group have any explicit rule denying access? Does your public subnets also have a Route Table with a route targeting local?

argutierrez00 commented 3 years ago

You mentioned your RDS is running on a private subnet. Does your Route Table have a route for target local associated with all your private subnets?

Does your RDS security group have any explicit rule denying access?

Inbound: MYSQL/Aurora TCP 3306 sg- / app_prod

Outbound:

All traffic All All 0.0.0.0/0

Does your public subnets also have a Route Table with a route targeting local?

deleugpn commented 3 years ago

Yeah, I also thought about NAT, but it indeed doesn't seem relevant at this point. Your computer (A) is suppose to connect to the bastion (B) and the bastion connects to the RDS (C). B is in a public subnet with an Internet Gateway and C is in a private subnet, but is capable of communicating with B through the local route.

Everything seems to be correctly set from the networking side. Short from asking for VPC Flow Logs, the only other thing I can think of is checking the Task CloudWatch Logs to see if the sshd service is failing somehow.

deleugpn commented 3 years ago

Do you have a firewall on your local network that could be blocking the connection from being established?

argutierrez00 commented 3 years ago

Nope, no firewall on my current network.

Also, re: the sshd logs on the task, the log seems empty too. I checked on the Task > Logs tab, and then the cloudwatch log group port7777-7777LogGroup-.

Also, this AWS account was created via AWS Organizations. Might not matter at all, just a hail mary of some sort. :D

eanselmi commented 1 year ago

I'm getting the same error but only on github actions, not in my computer

image

Run 7777 --region=us-east-1 --database staging-database-agosto & The connection to the SSH tunnel timed out. Please reinstall 7777 or check your Security Group.

ctrlplusb commented 9 months ago

I'm another unfortunate victim to the above. I've checked everything as per the recommendations above, even the cloudwatch logs. On face value it appears that everything is configure and starting as it should. Eeek.

neoReuters commented 8 months ago

I've found the problem usually stems from networking issues. I experienced this too, but I realised my issue was that I was behind a private VPN which was obscuring the correct source IP.