rija / gigadb-website

Source code for running GigaDB
http://gigadb.org
GNU General Public License v3.0
1 stars 1 forks source link

Aws security test ssh audit #745 #206

Closed kencho51 closed 2 years ago

kencho51 commented 2 years ago

This PR for gigascience #745

Changes to the bastion_playbook

  1. Install fail2ban in bastion server
    # check fail2an status on bastion server
    [centos@ip-10-99-0-202 ~]$ systemctl list-units --type=service | grep fail2ban
    fail2ban.service                       loaded active running Fail2Ban Service                                                             
    [centos@ip-10-99-0-202 ~]$ systemctl list-units --type=service --state=active | grep fail2ban
    fail2ban.service                       loaded active running Fail2Ban Service 

Changes to the terraform and ansible init scripts

  1. Connection to dockerhost through bastion server by adding ansible_ssh_common_args to /inventoris/hosts.

To test if ProxyCommand is working in terminal

Add in the server details in ~/.ssh/config, the real HostName and IdentifyFile have been masked for demonstration purpose, for example:

Host staging_server (user define)
    HostName ec2-yyy-yyy-yyy-yyy.ap-east-1.compute.amazonaws.com
    User centos
    IdentityFile ~/.ssh/test.pem
    ProxyCommand ssh -W %h:%p staging-bastion

Host staging-bastion (user define)
    HostName ec2-xxx-xxx-xx-xxx.ap-east-1.compute.amazonaws.com    
    User centos
    IdentityFile ~/.ssh/test.pem
  1. Test the connection through bastion in terminal
    
    % ssh staging_server
    [centos@ip-10-99-0-89 ~]$

to confirm the logged user ip which should be private ip of the bastion server

[centos@ip-10-99-0-89 ~]$ w -i 07:03:51 up 1 day, 1:32, 1 user, load average: 0.00, 0.00, 0.00 USER TTY FROM LOGIN@ IDLE JCPU PCPU WHAT centos pts/0 ec2_bastion_private_ip 07:03 3.00s 0.00s 0.00s w -i


### Steps to test the updated provisioning

Go to dir

% cd /gigadb-website/ops/infrastructure/envs/staging

Copy terraform files to staging environment

% ../../../scripts/tf_init.sh --project gigascience/forks/kencho-gigadb-website --env staging ... You need to specify the bastsion server name in ssh config file: staging-bastion You need to specify an AWS region: ap-east-1 ing ... % terraform plan % terraform apply % terraform refresh 2021/11/10 16:37:54 [DEBUG] GET https://gitlab.com/api/v4/projects/gigascience%2Fforks%2Fkencho-gigadb-website/terraform/state/staging_infra ec2_bastion_private_ip = "10.99.0.94" ec2_bastion_public_ip = "xxx.xxx.xx.xxx" ec2_private_ip = "10.99.0.161" ec2_public_ip = "yyy.yyy.yyy.yyy" rds_instance_address = "rds-server-staging-ken.abcdef.ap-east-1.rds.amazonaws.com"

Copy ansible files

% ../../../scripts/ansible_init.sh --env staging

Provision with ansible

Supply TF_KEY_NAME=private_ip to dockerhost playbook

% TF_KEY_NAME=private_ip ansible-playbook -i ../../inventories dockerhost_playbook.yml ... TASK [Gathering Facts] * (At first time only) The authenticity of host '' can't be established. ECDSA key fingerprint is SHA256:XKWz5ApMaEcX+9LdMbxI4RXB6Y+E/MFfIB+kMjcfyPM. Are you sure you want to continue connecting (yes/no/[fingerprint])? yes ... PLAY RECAP ***** 10.99.0.161 : ok=62 changed=44 unreachable=0 failed=0 skipped=1 rescued=0 ignored=0

If TF_KEY_NAME=private_ip is not supplied, the ansible will fail and show error

% ansible-playbook -i ../../inventories dockerhost_playbook.yml TASK [Gathering Facts] ***** fatal: [yyy.yyy.yyy.yyy]: UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh: Connection timed out during banner exchange", "unreachable": true}

rija commented 2 years ago

Hi @kencho51,

I have started testing this PR, and I was able to deploy a staging and live instances in eu-west-3 that are functional. All the tests are passing locally and on CI. I confirm that I cannot ssh the dockerhost anymore but and I can ssh the bastion. The documentation is clear on the process.

Something that would be useful to add in the documentation is a section on how to manually ssh to dockerhost through the bastion as that is a valid scenario during debugging of remote enviromnents.

rija commented 2 years ago

Thanks @kencho51 for the update to the doc. Copying the SSH private keys to another machine is far from ideal (i.e: it's a security anti-pattern). We need to find a better way.

pli888 commented 2 years ago

I can confirm that fail2ban is working on the bastion server. So when I configure fail2ban to ban dodgy IPs for 2 minutes after they try SSHing into bastion a maximum of 2 times, this happens:

# Test fail2ban is working on bastion by making repeated SSH login attempts from my dev machine
$ while true; do ssh centos@ec2-**-***-***-***.ap-northeast-1.compute.amazonaws.com 2>&1; sleep 1; done
centos@ec2-**-***-***-***.ap-northeast-1.compute.amazonaws.com: Permission denied (publickey,gssapi-keyex,gssapi-with-mic).
centos@ec2-**-***-***-***.ap-northeast-1.compute.amazonaws.com: Permission denied (publickey,gssapi-keyex,gssapi-with-mic).
centos@ec2-**-***-***-***.ap-northeast-1.compute.amazonaws.com: Permission denied (publickey,gssapi-keyex,gssapi-with-mic).
ssh: connect to host ec2-**-***-***-***.ap-northeast-1.compute.amazonaws.com port 22: Connection refused
ssh: connect to host ec2-**-***-***-***.ap-northeast-1.compute.amazonaws.com port 22: Connection refused
# Check public ip on dev machine provided by VPN
$ curl ipecho.net/plain ; echo
146.***.**.*
# Meanwhile on bastion server:
$ sudo tail /var/log/fail2ban.log
2021-11-22 08:48:47,939 fail2ban.filter         [86172]: INFO      maxRetry: 2
2021-11-22 08:48:47,939 fail2ban.filter         [86172]: INFO      findtime: 300
2021-11-22 08:48:47,939 fail2ban.actions        [86172]: INFO      banTime: 120
2021-11-22 08:48:47,939 fail2ban.filter         [86172]: INFO      encoding: UTF-8
2021-11-22 08:48:47,941 fail2ban.jail           [86172]: INFO    Jail 'sshd' started
2021-11-22 08:49:24,617 fail2ban.filter         [86172]: INFO    [sshd] Found 146.***.**.* - 2021-11-22 08:49:24
2021-11-22 08:49:25,596 fail2ban.filter         [86172]: INFO    [sshd] Found 146.***.**.* - 2021-11-22 08:49:25
2021-11-22 08:49:25,985 fail2ban.actions        [86172]: NOTICE  [sshd] Ban 146.***.**.*
2021-11-22 08:51:25,466 fail2ban.actions        [86172]: NOTICE  [sshd] Unban 146.***.**.*

A Connection refused message is caused by fail2ban when it encounters a dodgy IP.

rija commented 2 years ago

@kencho51 This article should help: http://web.archive.org/web/20170413054605/https://10mi2.wordpress.com/2015/01/14/using-ssh-through-a-bastion-host-transparently/

rija commented 2 years ago

@kencho51 Using the config shared by @pli888, I was able to confirm that fail2ban is working on bastion server:

2021-11-22 15:42:47,907 fail2ban.jail           [1365]: INFO    Jail 'sshd' started
2021-11-22 15:43:49,754 fail2ban.filter         [1365]: INFO    [sshd] Found 199.76.38.123 - 2021-11-22 15:43:49
2021-11-22 15:43:49,755 fail2ban.filter         [1365]: INFO    [sshd] Found 199.76.38.123 - 2021-11-22 15:43:49
2021-11-22 15:43:49,974 fail2ban.actions        [1365]: NOTICE  [sshd] Ban 199.76.38.123
2021-11-22 15:45:49,809 fail2ban.actions        [1365]: NOTICE  [sshd] Unban 199.76.38.123
rija commented 2 years ago

@kencho51, @pli888,

The following onliner (based on @pli888's comment above in which a bit is missing at the end) allowed me to connect to the private dockerhost through the bastion:

ssh -i ~/.ssh/<EC2 Key Pair private key>.pem -o ProxyCommand="ssh -W %h:%p -i ~/.ssh/<EC2 Key Pair private key>.pem centos@<bastion public IP>" centos@<dockerhost private IP>
rija commented 2 years ago

Hi @kencho51,

I've noticed an issue whereby if for some reason the bastion EC2 instance is rebooted, the fail2ban service doesn't start after reboot, which is problematic.

The reason is because fail2ban is not setup to automatically start upon system boot (it's only working after provisioning because the Ansible yum command start the service after it installed the package).

Aside: I think we should use dnf instead of yum actually on Centos 8.4

The resolution for that issue is to ensure that in /etc/systemd/system/multi-user.target.wants/, there is a symlink for /usr/lib/systemd/system/fail2ban.service

This should be doable by adding (after the install one) a task that looks like below example to ops/infrastructure/roles/fail2ban Ansible role:

- name: Enable fail2ban to start upon reboot
  file:
    src: /usr/lib/systemd/system/fail2ban.service
    dest: /etc/systemd/system/multi-user.target.wants/fail2ban.service
    state: link

I think after this issue is resolved and the docs for manual SSH is updated, this PR would be Ok to go.