whilenull / 7777-support

Documentation and support for 7777.
https://port7777.com
54 stars 3 forks source link

Can't reinstall: "Resource is not in the state stackCreateComplete" #59

Closed robeberhardt closed 1 month ago

robeberhardt commented 1 month ago

I was getting the error 'The connection to the SSH tunnel timed out. Please reinstall 7777 or check your Security Group' earlier today so I deleted the stack and am now trying to reinstall.

Whenever I try to reinstall, I get this error:

AWS_PROFILE=<myprofile> 7777 --verbose
Using the AWS region us-east-1.
Validating the 7777 license.
Generating unique RSA keys for the SSH tunnel.
Checking if the port 7777 is available.
Port 7777 selected.
Listing databases.
Only one database found in this region, automatically selecting admin-dev-admindbinstance.
Retrieving the availability zone and subnet of your instance.
Using subnet subnet-<hash>
Checking if 7777 is set up in the AWS account.
Setting up 7777 in the AWS account, this usually takes 30 seconds.
    ResourceNotReady: Resource is not in the state stackCreateComplete
    Code: ResourceNotReady

in my cloudformation events log I'm seeing Resource handler returned message: "Resource of type 'AWS::ECS::Cluster' with identifier '7777Cluster' already exists." (RequestToken: <someToken>, HandlerErrorCode: AlreadyExists) afte rthe Resource creation initiated message

help! I hate using the RDS query interface ;)

thank you

robeberhardt commented 1 month ago

image

robeberhardt commented 1 month ago

aws rds describe-db-clusters --output json | jq '.DBClusters[].Endpoint' returns only my expected primary db: admin-dev-<somehashstuff>.us-east-1.rds.amazonaws.com before running 7777

mnapoli commented 1 month ago

Hi! Judging by this error:

Resource of type 'AWS::ECS::Cluster' with identifier '7777Cluster' already exists.

It seems that for some reason the cluster still exists and was not deleted before. Can you try to manually delete it in the ECS console?

I hope that was the only resource that was left behind by the previous deletion. Usually that can happen if the stack is deleted directly in the CloudFormation console, something fails, and the UI asks "do you want to ignore these resources". If the person answers "yes", then the resources (e.g. here the ECS cluster) is not deleted and CloudFormation stops tracking it.

robeberhardt commented 1 month ago

@mnapoli that did the trick, thanks so much - I didn't think to check that because cloudformation was confidently telling me it deleted the cluster :\

robeberhardt commented 1 month ago

@mnapoli I spoke too soon :(

Retrieving the container security group.
Retrieving the computer's IP address.
Authorizing 66.31.65.118 on the security group.
Starting the Fargate container.
The IP address of the container is <ip>
Starting the SSH tunnel to <ip>:22.
SSH tunnel error: Timed out while waiting for handshake
The connection to the SSH tunnel timed out. Please reinstall 7777 or check your Security Group.

back to (I guess) the original problem from earlier this morning -- how best to troubleshoot this?

UPDATE: for what it's worth, I'm seeing the same timeout issue today in my production account as well

mnapoli commented 1 month ago

Got it, in that case this is similar to https://github.com/whilenull/7777-support/issues/20

It might be the VPC security group that does not allow your IP address to connect to the SSH bastion.

Is there any reason why your IP address might not be detected correctly? Or changes? (e.g. a VPN, firewall, TOR, etc.)

robeberhardt commented 1 month ago

I have an inbound rule for a security group named 7777-container-security-group-vpc-, it's for TCP port 22 and associated with my correct ip address, with this description: [7777] Authorizing IP Address of 7777 user

I don't see any other rules for port 5432

robeberhardt commented 1 month ago

and yes, very similar to #20, including the difficulties uninstalling and reinstalling

mnapoli commented 1 month ago

Where are you running the 7777 command? (OS, terminal) Are you using a VPN or a firewall?

robeberhardt commented 1 month ago

no VPN, built-in MacOS firewall is on 7777/1.1.14 darwin-x64 node-v14.20.0 using Warp terminal, MacOS everything's been working great until yesterday I did restart my Mac recently but the IP that 7777 is adding to the SG rule matches my external IP

robeberhardt commented 1 month ago

in MacOS Firewall settings, 7777 has 'Allow incoming connections' set

mnapoli commented 1 month ago

Ok, thanks for all the details. MacOS is the most common scenario (that's what I use too), if you don't use a firewall like LittleSnitch and it used to work yesterday, I suspect that's not the cause.

At this point I'd try two things:

robeberhardt commented 1 month ago

this is what I'm seeing in cloudwatch logs after each attempt:

Starting the SSH daemon
SSH is starting, the container will terminate in 7200 seconds
Server listening on 0.0.0.0 port 22.
Server listening on :: port 22.
Timeout: terminating after 7200 seconds
robeberhardt commented 1 month ago

after the connection attempt times out and returns that error message about reinstallation, I see that the tasks are stacking up in the cluster, which is probably why I was having trouble deleting the CF stack

robeberhardt commented 1 month ago

@mnapoli wondering if the cloudwatch log looks correct to you or points to an issue? if we still don't have any ideas, I guess I'll start trying to stand up a bastion stack

deleugpn commented 1 month ago

Hey @robeberhardt can you share a few things with me, please:

robeberhardt commented 1 month ago

@deleugpn:

7777 Security Group:

inbound: Type SSH, Protocol TCP, Port 22, Source: <my_correct_external_ip, starts with 10.>/32 outbound: Type All Traffic, Protocol All, Port Range All, Destination 0.0.0.0/0

VPC NACL Rules:

inbound:

Route Table Attached to 7777Cluster subnet:

Route table attached to my RDS subnet:

Private Subnet 1:

I will spin up a container and try ssh now

Thanks for your help!

robeberhardt commented 1 month ago

@deleugpn ec2 ssh test:

ssh -i /Users/rob/Desktop/rob-dev-key-pair.pem ec2-user@ec2-54-210-95-246.compute-1.amazonaws.com -vvv
OpenSSH_9.6p1, LibreSSL 3.3.6
debug1: Reading configuration data /Users/rob/.ssh/config
debug3: /Users/rob/.ssh/config line 1: Including file /Users/rob/.orbstack/ssh/config depth 0
debug1: Reading configuration data /Users/rob/.orbstack/ssh/config
debug1: Reading configuration data /etc/ssh/ssh_config
debug1: /etc/ssh/ssh_config line 21: include /etc/ssh/ssh_config.d/* matched no files
debug1: /etc/ssh/ssh_config line 54: Applying options for *
debug3: expanded UserKnownHostsFile '~/.ssh/known_hosts' -> '/Users/rob/.ssh/known_hosts'
debug3: expanded UserKnownHostsFile '~/.ssh/known_hosts2' -> '/Users/rob/.ssh/known_hosts2'
debug1: Authenticator provider $SSH_SK_PROVIDER did not resolve; disabling
debug3: channel_clear_timeouts: clearing
debug1: Connecting to ec2-54-210-95-246.compute-1.amazonaws.com port 22.
ssh: connect to host ec2-54-210-95-246.compute-1.amazonaws.com port 22: Operation timed out
deleugpn commented 1 month ago

the 7777 container you just launched now has the ip 54.210.95.246? Can you try ssh -vvv root@54.210.95.246?

robeberhardt commented 1 month ago
ssh -vvv root@54.210.95.246
OpenSSH_9.6p1, LibreSSL 3.3.6
debug1: Reading configuration data /Users/rob/.ssh/config
debug3: /Users/rob/.ssh/config line 1: Including file /Users/rob/.orbstack/ssh/config depth 0
debug1: Reading configuration data /Users/rob/.orbstack/ssh/config
debug1: Reading configuration data /etc/ssh/ssh_config
debug1: /etc/ssh/ssh_config line 21: include /etc/ssh/ssh_config.d/* matched no files
debug1: /etc/ssh/ssh_config line 54: Applying options for *
debug2: resolve_canonicalize: hostname 54.210.95.246 is address
debug3: expanded UserKnownHostsFile '~/.ssh/known_hosts' -> '/Users/rob/.ssh/known_hosts'
debug3: expanded UserKnownHostsFile '~/.ssh/known_hosts2' -> '/Users/rob/.ssh/known_hosts2'
debug1: Authenticator provider $SSH_SK_PROVIDER did not resolve; disabling
debug3: channel_clear_timeouts: clearing
debug3: ssh_connect_direct: entering
debug1: Connecting to 54.210.95.246 [54.210.95.246] port 22.
debug3: set_sock_tos: set socket 3 IP_TOS 0x48
debug1: connect to address 54.210.95.246 port 22: Operation timed out
ssh: connect to host 54.210.95.246 port 22: Operation timed out
deleugpn commented 1 month ago

Could you go to the Security Group created by 7777 and add a rule that allows every IP and try the ssh command again, please?

robeberhardt commented 1 month ago

added this rule:

image
ssh -vvv root@54.210.95.246
OpenSSH_9.6p1, LibreSSL 3.3.6
debug1: Reading configuration data /Users/rob/.ssh/config
debug3: /Users/rob/.ssh/config line 1: Including file /Users/rob/.orbstack/ssh/config depth 0
debug1: Reading configuration data /Users/rob/.orbstack/ssh/config
debug1: Reading configuration data /etc/ssh/ssh_config
debug1: /etc/ssh/ssh_config line 21: include /etc/ssh/ssh_config.d/* matched no files
debug1: /etc/ssh/ssh_config line 54: Applying options for *
debug2: resolve_canonicalize: hostname 54.210.95.246 is address
debug3: expanded UserKnownHostsFile '~/.ssh/known_hosts' -> '/Users/rob/.ssh/known_hosts'
debug3: expanded UserKnownHostsFile '~/.ssh/known_hosts2' -> '/Users/rob/.ssh/known_hosts2'
debug1: Authenticator provider $SSH_SK_PROVIDER did not resolve; disabling
debug3: channel_clear_timeouts: clearing
debug3: ssh_connect_direct: entering
debug1: Connecting to 54.210.95.246 [54.210.95.246] port 22.
debug3: set_sock_tos: set socket 3 IP_TOS 0x48
debug1: connect to address 54.210.95.246 port 22: Operation timed out
ssh: connect to host 54.210.95.246 port 22: Operation timed out
deleugpn commented 1 month ago

Could it be that the IP address is wrong? Can you check on AWS ECS -> 7777 Cluster -> Running Tasks

robeberhardt commented 1 month ago
image

yes, looks like it changed, let me try the new public ip

robeberhardt commented 1 month ago
ssh -vvv root@3.221.159.213
OpenSSH_9.6p1, LibreSSL 3.3.6
debug1: Reading configuration data /Users/rob/.ssh/config
debug3: /Users/rob/.ssh/config line 1: Including file /Users/rob/.orbstack/ssh/config depth 0
debug1: Reading configuration data /Users/rob/.orbstack/ssh/config
debug1: Reading configuration data /etc/ssh/ssh_config
debug1: /etc/ssh/ssh_config line 21: include /etc/ssh/ssh_config.d/* matched no files
debug1: /etc/ssh/ssh_config line 54: Applying options for *
debug2: resolve_canonicalize: hostname 3.221.159.213 is address
debug3: expanded UserKnownHostsFile '~/.ssh/known_hosts' -> '/Users/rob/.ssh/known_hosts'
debug3: expanded UserKnownHostsFile '~/.ssh/known_hosts2' -> '/Users/rob/.ssh/known_hosts2'
debug1: Authenticator provider $SSH_SK_PROVIDER did not resolve; disabling
debug3: channel_clear_timeouts: clearing
debug3: ssh_connect_direct: entering
debug1: Connecting to 3.221.159.213 [3.221.159.213] port 22.
debug3: set_sock_tos: set socket 3 IP_TOS 0x48
debug1: connect to address 3.221.159.213 port 22: Operation timed out
ssh: connect to host 3.221.159.213 port 22: Operation timed out
deleugpn commented 1 month ago

This IP now I get to establish a successful connection:

~/# ssh -v root@3.221.159.213
OpenSSH_9.6p1, LibreSSL 3.3.6
debug1: Reading configuration data /etc/ssh/ssh_config
debug1: /etc/ssh/ssh_config line 21: include /etc/ssh/ssh_config.d/* matched no files
debug1: /etc/ssh/ssh_config line 54: Applying options for *
debug1: Authenticator provider $SSH_SK_PROVIDER did not resolve; disabling
debug1: Connecting to 3.221.159.213 [3.221.159.213] port 22.
debug1: Connection established.
debug1: identity file /Users/deleu/.ssh/id_rsa type -1
debug1: identity file /Users/deleu/.ssh/id_rsa-cert type -1
debug1: identity file /Users/deleu/.ssh/id_ecdsa type -1
debug1: identity file /Users/deleu/.ssh/id_ecdsa-cert type -1
debug1: identity file /Users/deleu/.ssh/id_ecdsa_sk type -1
debug1: identity file /Users/deleu/.ssh/id_ecdsa_sk-cert type -1
debug1: identity file /Users/deleu/.ssh/id_ed25519 type 3
debug1: identity file /Users/deleu/.ssh/id_ed25519-cert type -1
debug1: identity file /Users/deleu/.ssh/id_ed25519_sk type -1
debug1: identity file /Users/deleu/.ssh/id_ed25519_sk-cert type -1
debug1: identity file /Users/deleu/.ssh/id_xmss type -1
debug1: identity file /Users/deleu/.ssh/id_xmss-cert type -1
debug1: identity file /Users/deleu/.ssh/id_dsa type -1
debug1: identity file /Users/deleu/.ssh/id_dsa-cert type -1
debug1: Local version string SSH-2.0-OpenSSH_9.6
debug1: Remote protocol version 2.0, remote software version OpenSSH_9.3
debug1: compat_banner: match: OpenSSH_9.3 pat OpenSSH* compat 0x04000000
debug1: Authenticating to 3.221.159.213:22 as 'root'
debug1: load_hostkeys: fopen /Users/deleu/.ssh/known_hosts2: No such file or directory
debug1: load_hostkeys: fopen /etc/ssh/ssh_known_hosts: No such file or directory
debug1: load_hostkeys: fopen /etc/ssh/ssh_known_hosts2: No such file or directory
debug1: SSH2_MSG_KEXINIT sent
debug1: SSH2_MSG_KEXINIT received
debug1: kex: algorithm: sntrup761x25519-sha512@openssh.com
debug1: kex: host key algorithm: rsa-sha2-512
debug1: kex: server->client cipher: chacha20-poly1305@openssh.com MAC: <implicit> compression: none
debug1: kex: client->server cipher: chacha20-poly1305@openssh.com MAC: <implicit> compression: none
debug1: expecting SSH2_MSG_KEX_ECDH_REPLY
debug1: SSH2_MSG_KEX_ECDH_REPLY received
debug1: Server host key: ssh-rsa SHA256:ltJPV1UTx5LRzw8CXDbEij2Sm8EDS/RLoHyYXlp4nvQ
debug1: load_hostkeys: fopen /Users/deleu/.ssh/known_hosts2: No such file or directory
debug1: load_hostkeys: fopen /etc/ssh/ssh_known_hosts: No such file or directory
debug1: load_hostkeys: fopen /etc/ssh/ssh_known_hosts2: No such file or directory
debug1: Host '3.221.159.213' is known and matches the RSA host key.
debug1: Found key in /Users/deleu/.ssh/known_hosts:24
debug1: rekey out after 134217728 blocks
debug1: SSH2_MSG_NEWKEYS sent
debug1: expecting SSH2_MSG_NEWKEYS
debug1: SSH2_MSG_NEWKEYS received
debug1: rekey in after 134217728 blocks
debug1: SSH2_MSG_EXT_INFO received
debug1: kex_ext_info_client_parse: server-sig-algs=<ssh-ed25519,sk-ssh-ed25519@openssh.com,ecdsa-sha2-nistp256,ecdsa-sha2-nistp384,ecdsa-sha2-nistp521,sk-ecdsa-sha2-nistp256@openssh.com,webauthn-sk-ecdsa-sha2-nistp256@openssh.com,ssh-dss,ssh-rsa,rsa-sha2-256,rsa-sha2-512>
debug1: kex_ext_info_check_ver: publickey-hostbound@openssh.com=<0>
debug1: SSH2_MSG_SERVICE_ACCEPT received
debug1: Authentications that can continue: publickey,keyboard-interactive
debug1: Next authentication method: publickey
debug1: get_agent_identities: bound agent to hostkey
debug1: get_agent_identities: ssh_fetch_identitylist: agent contains no identities
debug1: Will attempt key: /Users/deleu/.ssh/id_rsa
debug1: Will attempt key: /Users/deleu/.ssh/id_ecdsa
debug1: Will attempt key: /Users/deleu/.ssh/id_ecdsa_sk
debug1: Will attempt key: /Users/deleu/.ssh/id_ed25519 ED25519 SHA256:G9DV05Z4I0SfMWhY1cmek8wpmnLzamSb3j24XYNB20g
debug1: Will attempt key: /Users/deleu/.ssh/id_ed25519_sk
debug1: Will attempt key: /Users/deleu/.ssh/id_xmss
debug1: Will attempt key: /Users/deleu/.ssh/id_dsa
debug1: Trying private key: /Users/deleu/.ssh/id_rsa
debug1: Trying private key: /Users/deleu/.ssh/id_ecdsa
debug1: Trying private key: /Users/deleu/.ssh/id_ecdsa_sk
debug1: Offering public key: /Users/deleu/.ssh/id_ed25519 ED25519 SHA256:G9DV05Z4I0SfMWhY1cmek8wpmnLzamSb3j24XYNB20g
debug1: Authentications that can continue: publickey,keyboard-interactive
debug1: Trying private key: /Users/deleu/.ssh/id_ed25519_sk
debug1: Trying private key: /Users/deleu/.ssh/id_xmss
debug1: Trying private key: /Users/deleu/.ssh/id_dsa
debug1: Next authentication method: keyboard-interactive
debug1: Authentications that can continue: publickey,keyboard-interactive
debug1: No more authentication methods to try.
root@3.221.159.213: Permission denied (publickey,keyboard-interactive).
deleugpn commented 1 month ago

If you're still getting connection timeout, there must be something on your network or on your computer that is "firewalling" the SSH connection

robeberhardt commented 1 month ago

I have tried turning off my MacOS Firewall completely and it made no difference. I'm not on a VPN, and don't have Little Snitch installed. Can you think of anything else I should check?

deleugpn commented 1 month ago
robeberhardt commented 1 month ago

@deleugpn looks like my router had a firmware update that changed the default firewall settings. I'm back in business, thanks for taking the time to help me troubleshoot!

mnapoli commented 1 month ago

Awesome, thank you @deleugpn!