maticnetwork / terraform-polygon-supernets

31 stars 22 forks source link

Deployment Issue at step 6 #4

Closed aphexyuri closed 1 year ago

aphexyuri commented 1 year ago

Firstly, thanks for the work on the TF/Ansible deployment. I did however run in to an issue at step 6 with the following:

Hoping it's a simple fix or something I'm missing; perhaps some steps required to set up Prometheus. Help would be greatly appreciated.



The error appears to be in '/Users/myuser/Desktop/terraform-polygon-supernets/ansible/site.yml': line 36, column 7, but may
be elsewhere in the file depending on the exact syntax problem.

The offending line appears to be:

  roles:
    - prometheus.prometheus.node_exporter
      ^ here```
tinom9 commented 1 year ago

Have you installed the Ansible requirements?

cd ansible
ansible-galaxy install -r requirements.yml
aphexyuri commented 1 year ago

@tinom9 that did the trick., it can't seem to reach the nodes and running alias ansible='ansible --inventory inventory/aws_ec2.yml --vault-password-file=password.txt --extra-vars "@local-extra-vars.yml"' ansible -m all ping give me:

(base) ➜  ansible git:(main) ✗ alias ansible='ansible --inventory inventory/aws_ec2.yml --vault-password-file=password.txt --extra-vars "@local-extra-vars.yml"'
ansible -m all ping
[WARNING]: Could not match supplied host pattern, ignoring: ping
[WARNING]: No hosts matched, nothing to do

inventory looks okay:

(base) ➜  ansible git:(main) ✗ ansible-inventory --graph
@all:
  |--@ungrouped:
  |--@aws_ec2:
  |  |--i-07d77f8eeb0b41523
  |  |--i-084e52f1f7ed26860
  |  |--i-0a3c97f08afce6885
  |  |--i-0d6132a33909c748f
  |--@validator:
  |  |--i-07d77f8eeb0b41523
  |  |--i-084e52f1f7ed26860
  |  |--i-0a3c97f08afce6885
  |  |--i-0d6132a33909c748f
  |--@devnet01_edge_rg_private:
  |  |--i-07d77f8eeb0b41523
  |  |--i-084e52f1f7ed26860
  |  |--i-0a3c97f08afce6885
  |  |--i-0d6132a33909c748f
  |--@validator_001:
  |  |--i-07d77f8eeb0b41523
  |--@validator_004:
  |  |--i-084e52f1f7ed26860
  |--@validator_002:
  |  |--i-0a3c97f08afce6885
  |--@validator_003:
  |  |--i-0d6132a33909c748f
tinom9 commented 1 year ago

Try ansible all -m ping :)

aphexyuri commented 1 year ago

Great, ty! Ping runs but seems like the instances aren't reachable:

(base) ➜  ansible git:(main) ✗ ansible all -m ping
i-084e52f1f7ed26860 | UNREACHABLE! => {
    "changed": false,
    "msg": "Failed to connect to the host via ssh: kex_exchange_identification: Connection closed by remote host\r\nConnection closed by UNKNOWN port 65535",
    "unreachable": true
}
i-07d77f8eeb0b41523 | UNREACHABLE! => {
    "changed": false,
    "msg": "Failed to connect to the host via ssh: kex_exchange_identification: Connection closed by remote host\r\nConnection closed by UNKNOWN port 65535",
    "unreachable": true
}
i-0d6132a33909c748f | UNREACHABLE! => {
    "changed": false,
    "msg": "Failed to connect to the host via ssh: kex_exchange_identification: Connection closed by remote host\r\nConnection closed by UNKNOWN port 65535",
    "unreachable": true
}
i-0a3c97f08afce6885 | UNREACHABLE! => {
    "changed": false,
    "msg": "Failed to connect to the host via ssh: kex_exchange_identification: Connection closed by remote host\r\nConnection closed by UNKNOWN port 65535",
    "unreachable": true
}

I do see them all running in the AWS console though.

tinom9 commented 1 year ago

I'd say it's not getting the right ssh key. Make sure it's accessible and you've set it up properly.

You can always test it by connecting to a validator instance with the specified params:

ssh -i $SSH_KEY_FILE ubuntu@$VALIDATOR_01_INSTANCE_ID \
    -o IdentitiesOnly=yes \
    -o StrictHostKeyChecking=no \
    -o ProxyCommand="sh -c \"aws ssm start-session --target %h --document-name AWS-StartSSHSession --parameters 'portNumber=%p'\""
aphexyuri commented 1 year ago

What should I use for VALIDATOR_01_INSTANCE_ID?

On a side note, there's a discrepancy in step 7 with the private key location ~/.ssh/ vs ~/cert/ paths - I made it all ~/.ssh/ (also ansible_ssh_private_key_file: ~/.ssh/devnet_private.key in local-extra-vars.yml)

aphexyuri commented 1 year ago

@tinom9 can i ask what OS you're using? was just trying our run.sh w/o success:

[WARNING]:  * Failed to parse /Users/yurivisser/Desktop/terraform-polygon-supernets/ansible/inventory/aws_ec2.yml with auto plugin: Failed
to describe instances: An error occurred (UnauthorizedOperation) when calling the DescribeInstances operation: You are not authorized to
perform this operation.
[WARNING]:  * Failed to parse /Users/yurivisser/Desktop/terraform-polygon-supernets/ansible/inventory/aws_ec2.yml with yaml plugin: Plugin
configuration YAML file, not YAML inventory
[WARNING]:  * Failed to parse /Users/yurivisser/Desktop/terraform-polygon-supernets/ansible/inventory/aws_ec2.yml with ini plugin: Invalid
host pattern '---' supplied, '---' is normally a sign this is a YAML file.
[WARNING]: Unable to parse /Users/yurivisser/Desktop/terraform-polygon-supernets/ansible/inventory/aws_ec2.yml as an inventory source
[WARNING]: No inventory was parsed, only implicit localhost is available
[WARNING]: provided hosts list is empty, only localhost is available. Note that the implicit localhost does not match 'all'
Starting galaxy collection install process
Nothing to do. All requested collections are already installed. If you want to reinstall them, consider using `--force`.
[WARNING]:  * Failed to parse /Users/yurivisser/Desktop/terraform-polygon-supernets/ansible/inventory/aws_ec2.yml with auto plugin: Failed
to describe instances: An error occurred (UnauthorizedOperation) when calling the DescribeInstances operation: You are not authorized to
perform this operation.
[WARNING]:  * Failed to parse /Users/yurivisser/Desktop/terraform-polygon-supernets/ansible/inventory/aws_ec2.yml with yaml plugin: Plugin
configuration YAML file, not YAML inventory
[WARNING]:  * Failed to parse /Users/yurivisser/Desktop/terraform-polygon-supernets/ansible/inventory/aws_ec2.yml with ini plugin: Invalid
host pattern '---' supplied, '---' is normally a sign this is a YAML file.
[WARNING]: Unable to parse /Users/yurivisser/Desktop/terraform-polygon-supernets/ansible/inventory/aws_ec2.yml as an inventory source
[WARNING]: No inventory was parsed, only implicit localhost is available
[WARNING]: provided hosts list is empty, only localhost is available. Note that the implicit localhost does not match 'all'
[WARNING]: Collection prometheus.prometheus does not support Ansible version 2.14.4

PLAY [all] *********************************************************************************************************************************
skipping: no hosts matched
[WARNING]: Could not match supplied host pattern, ignoring: devnet01_edge_polygon_private

PLAY [all:&devnet01_edge_polygon_private] **************************************************************************************************
skipping: no hosts matched
[WARNING]: Could not match supplied host pattern, ignoring: geth

PLAY [geth:&devnet01_edge_polygon_private] *************************************************************************************************
skipping: no hosts matched
[WARNING]: Could not match supplied host pattern, ignoring: fullnode
[WARNING]: Could not match supplied host pattern, ignoring: validator

PLAY [fullnode:validator:&devnet01_edge_polygon_private] ***********************************************************************************
skipping: no hosts matched

PLAY [fullnode:validator:&devnet01_edge_polygon_private] ***********************************************************************************
skipping: no hosts matched

PLAY [fullnode:validator:&devnet01_edge_polygon_private] ***********************************************************************************
skipping: no hosts matched

PLAY RECAP *********************************************************************************************************************************
gatsbyz commented 1 year ago

are you still stuck on this? please confirm if ansible all -m ping is working

ajruizvargas commented 1 year ago

I am actually stuck at the same place. ansible all -m ping returning

i-0cb0636a89d0d03cf | UNREACHABLE! => {
    "changed": false,
    "msg": "Failed to connect to the host via ssh: ssh: Could not resolve hostname i-0cb0636a89d0d03cf: nodename nor servname provided, or not known",
    "unreachable": true
}
i-023e1176957aa6b8b | UNREACHABLE! => {
    "changed": false,
    "msg": "Failed to connect to the host via ssh: ssh: Could not resolve hostname i-023e1176957aa6b8b: nodename nor servname provided, or not known",
    "unreachable": true
}
i-04f517e27dc91d8e7 | UNREACHABLE! => {
    "changed": false,
    "msg": "Failed to connect to the host via ssh: ssh: Could not resolve hostname i-04f517e27dc91d8e7: nodename nor servname provided, or not known",
    "unreachable": true
}
i-0c07fca0d42ae4147 | UNREACHABLE! => {
    "changed": false,
    "msg": "Failed to connect to the host via ssh: ssh: Could not resolve hostname i-0c07fca0d42ae4147: nodename nor servname provided, or not known",
    "unreachable": true
}
i-01766b8b0e3f5d461 | UNREACHABLE! => {
    "changed": false,
    "msg": "Failed to connect to the host via ssh: ssh: Could not resolve hostname i-01766b8b0e3f5d461: nodename nor servname provided, or not known",
    "unreachable": true
}

Any ideas?

imarkus8787 commented 1 year ago

I was running into the same issue and was able to make some progress. The main issue i found was an AWS permission problem where i had to add a dedicated policy under my AWS IAM user. Its specific permission for the 4 validators and the geth-001 node. I also had to install session manager since along the way i got another error (https://docs.aws.amazon.com/systems-manager/latest/userguide/session-manager-working-with-install-plugin.html)

{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "ssm:StartSession" ], "Resource": [ "arn:aws:ec2:us-west-2:010531221017:instance/i-0c21d328536a23815", "arn:aws:ec2:us-west-2:010531221017:instance/i-029da648f72eca49e", "arn:aws:ec2:us-west-2:010531221017:instance/i-05c1dc647f0ab13ca", "arn:aws:ec2:us-west-2:010531221017:instance/i-07cc225ea7b1f8cfd", "arn:aws:ec2:us-west-2:010531221017:instance/i-0efe99c6be33073d4", "arn:aws:ssm:us-west-2::document/AWS-StartSSHSession" ] }, { "Effect": "Allow", "Action": [ "ssm:TerminateSession", "ssm:ResumeSession" ], "Resource": [ "arn:aws:ssm:::session/${aws:username}-*" ] } ] }

praetoriansentry commented 1 year ago

It's going to be important to use the full command: ansible --inventory inventory/aws_ec2.yml --extra-vars "@local-extra-vars.yml" all -m ping

In local-extra-vars.yml there are some lines like this:

ansible_ssh_private_key_file: ~/devnet_private.key
ansible_ssh_common_args: >
  -o IdentitiesOnly=yes
  -o StrictHostKeyChecking=no
  -o ProxyCommand="sh -c \"aws ssm start-session --target %h --document-name AWS-StartSSHSession --parameters 'portNumber=%p'\""

These can be edited based on your needs, but basically it would tell Ansible where to find your ssh key and also configure Ansible to use SSH over SSM.

aphexyuri commented 1 year ago

Finally got back to giving this a try, with the latest edge release and additions & changes to the docs. Turned out my aws ssm setup and setting vars (example.env step) was broken. Ty and great work on making the docs clearer and adding some verification commands along the way!

praetoriansentry commented 1 year ago

Awesome - thanks @aphexyuri - We'll be adding more documentation around tuning, loading testing, and regular operations (e.g. looking at logs, etc). If you have any other thoughts of documentation that would be helpful, feel free to drop us a line.