Closed kencho51 closed 2 years ago
Hi @kencho51,
I have started testing this PR, and I was able to deploy a staging and live instances in eu-west-3 that are functional. All the tests are passing locally and on CI. I confirm that I cannot ssh the dockerhost anymore but and I can ssh the bastion. The documentation is clear on the process.
Something that would be useful to add in the documentation is a section on how to manually ssh to dockerhost through the bastion as that is a valid scenario during debugging of remote enviromnents.
Thanks @kencho51 for the update to the doc. Copying the SSH private keys to another machine is far from ideal (i.e: it's a security anti-pattern). We need to find a better way.
I can confirm that fail2ban is working on the bastion server. So when I configure fail2ban to ban dodgy IPs for 2 minutes after they try SSHing into bastion a maximum of 2 times, this happens:
# Test fail2ban is working on bastion by making repeated SSH login attempts from my dev machine
$ while true; do ssh centos@ec2-**-***-***-***.ap-northeast-1.compute.amazonaws.com 2>&1; sleep 1; done
centos@ec2-**-***-***-***.ap-northeast-1.compute.amazonaws.com: Permission denied (publickey,gssapi-keyex,gssapi-with-mic).
centos@ec2-**-***-***-***.ap-northeast-1.compute.amazonaws.com: Permission denied (publickey,gssapi-keyex,gssapi-with-mic).
centos@ec2-**-***-***-***.ap-northeast-1.compute.amazonaws.com: Permission denied (publickey,gssapi-keyex,gssapi-with-mic).
ssh: connect to host ec2-**-***-***-***.ap-northeast-1.compute.amazonaws.com port 22: Connection refused
ssh: connect to host ec2-**-***-***-***.ap-northeast-1.compute.amazonaws.com port 22: Connection refused
# Check public ip on dev machine provided by VPN
$ curl ipecho.net/plain ; echo
146.***.**.*
# Meanwhile on bastion server:
$ sudo tail /var/log/fail2ban.log
2021-11-22 08:48:47,939 fail2ban.filter [86172]: INFO maxRetry: 2
2021-11-22 08:48:47,939 fail2ban.filter [86172]: INFO findtime: 300
2021-11-22 08:48:47,939 fail2ban.actions [86172]: INFO banTime: 120
2021-11-22 08:48:47,939 fail2ban.filter [86172]: INFO encoding: UTF-8
2021-11-22 08:48:47,941 fail2ban.jail [86172]: INFO Jail 'sshd' started
2021-11-22 08:49:24,617 fail2ban.filter [86172]: INFO [sshd] Found 146.***.**.* - 2021-11-22 08:49:24
2021-11-22 08:49:25,596 fail2ban.filter [86172]: INFO [sshd] Found 146.***.**.* - 2021-11-22 08:49:25
2021-11-22 08:49:25,985 fail2ban.actions [86172]: NOTICE [sshd] Ban 146.***.**.*
2021-11-22 08:51:25,466 fail2ban.actions [86172]: NOTICE [sshd] Unban 146.***.**.*
A Connection refused
message is caused by fail2ban when it encounters a dodgy IP.
@kencho51 This article should help: http://web.archive.org/web/20170413054605/https://10mi2.wordpress.com/2015/01/14/using-ssh-through-a-bastion-host-transparently/
@kencho51 Using the config shared by @pli888, I was able to confirm that fail2ban is working on bastion server:
2021-11-22 15:42:47,907 fail2ban.jail [1365]: INFO Jail 'sshd' started
2021-11-22 15:43:49,754 fail2ban.filter [1365]: INFO [sshd] Found 199.76.38.123 - 2021-11-22 15:43:49
2021-11-22 15:43:49,755 fail2ban.filter [1365]: INFO [sshd] Found 199.76.38.123 - 2021-11-22 15:43:49
2021-11-22 15:43:49,974 fail2ban.actions [1365]: NOTICE [sshd] Ban 199.76.38.123
2021-11-22 15:45:49,809 fail2ban.actions [1365]: NOTICE [sshd] Unban 199.76.38.123
@kencho51, @pli888,
The following onliner (based on @pli888's comment above in which a bit is missing at the end) allowed me to connect to the private dockerhost through the bastion:
ssh -i ~/.ssh/<EC2 Key Pair private key>.pem -o ProxyCommand="ssh -W %h:%p -i ~/.ssh/<EC2 Key Pair private key>.pem centos@<bastion public IP>" centos@<dockerhost private IP>
Hi @kencho51,
I've noticed an issue whereby if for some reason the bastion EC2 instance is rebooted, the fail2ban service doesn't start after reboot, which is problematic.
The reason is because fail2ban is not setup to automatically start upon system boot (it's only working after provisioning because the Ansible yum
command start the service after it installed the package).
Aside: I think we should use
dnf
instead ofyum
actually on Centos 8.4
The resolution for that issue is to ensure that in /etc/systemd/system/multi-user.target.wants/
, there is a symlink for /usr/lib/systemd/system/fail2ban.service
This should be doable by adding (after the install one) a task that looks like below example to ops/infrastructure/roles/fail2ban
Ansible role:
- name: Enable fail2ban to start upon reboot
file:
src: /usr/lib/systemd/system/fail2ban.service
dest: /etc/systemd/system/multi-user.target.wants/fail2ban.service
state: link
I think after this issue is resolved and the docs for manual SSH is updated, this PR would be Ok to go.
This PR for gigascience #745
Changes to the bastion_playbook
Changes to the terraform and ansible init scripts
ansible_ssh_common_args
to/inventoris/hosts
.To test if
ProxyCommand
is working in terminalAdd in the server details in
~/.ssh/config
, the realHostName
andIdentifyFile
have been masked for demonstration purpose, for example:to confirm the logged user ip which should be private ip of the bastion server
[centos@ip-10-99-0-89 ~]$ w -i 07:03:51 up 1 day, 1:32, 1 user, load average: 0.00, 0.00, 0.00 USER TTY FROM LOGIN@ IDLE JCPU PCPU WHAT centos pts/0 ec2_bastion_private_ip 07:03 3.00s 0.00s 0.00s w -i
Go to dir
% cd/gigadb-website/ops/infrastructure/envs/staging
Copy terraform files to staging environment
% ../../../scripts/tf_init.sh --project gigascience/forks/kencho-gigadb-website --env staging ... You need to specify the bastsion server name in ssh config file: staging-bastion You need to specify an AWS region: ap-east-1 ing ... % terraform plan % terraform apply % terraform refresh 2021/11/10 16:37:54 [DEBUG] GET https://gitlab.com/api/v4/projects/gigascience%2Fforks%2Fkencho-gigadb-website/terraform/state/staging_infra ec2_bastion_private_ip = "10.99.0.94" ec2_bastion_public_ip = "xxx.xxx.xx.xxx" ec2_private_ip = "10.99.0.161" ec2_public_ip = "yyy.yyy.yyy.yyy" rds_instance_address = "rds-server-staging-ken.abcdef.ap-east-1.rds.amazonaws.com"
Copy ansible files
% ../../../scripts/ansible_init.sh --env staging
Provision with ansible
Supply
TF_KEY_NAME=private_ip
to dockerhost playbook% TF_KEY_NAME=private_ip ansible-playbook -i ../../inventories dockerhost_playbook.yml ... TASK [Gathering Facts] * (At first time only) The authenticity of host '' can't be established.
ECDSA key fingerprint is SHA256:XKWz5ApMaEcX+9LdMbxI4RXB6Y+E/MFfIB+kMjcfyPM.
Are you sure you want to continue connecting (yes/no/[fingerprint])? yes
...
PLAY RECAP *****
10.99.0.161 : ok=62 changed=44 unreachable=0 failed=0 skipped=1 rescued=0 ignored=0
If
TF_KEY_NAME=private_ip
is not supplied, the ansible will fail and show error% ansible-playbook -i ../../inventories dockerhost_playbook.yml TASK [Gathering Facts] ***** fatal: [yyy.yyy.yyy.yyy]: UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh: Connection timed out during banner exchange", "unreachable": true}