paseaf / ContainerSSH-honeypot

An high-interaction SSH honeypot built with ContainerSSH for GCP
MIT License
2 stars 1 forks source link

`dockerd` on Sacrificial VM stopped running after a while #51

Closed paseaf closed 2 years ago

paseaf commented 2 years ago

Current behavior: dockerd stopped responding

dockerd logs

2022-07-24 04:48:35.375843 I | http: Accept error: accept tcp [::]:2376: accept4: too many open files; retrying in 1s
2022-07-24 04:48:35.613364 I | http: Accept error: accept unix /var/run/docker.sock: accept4: too many open files; retrying in 1s
2022-07-24 04:48:36.376390 I | http: Accept error: accept tcp [::]:2376: accept4: too many open files; retrying in 1s
2022-07-24 04:48:36.613764 I | http: Accept error: accept unix /var/run/docker.sock: accept4: too many open files; retrying in 1s

ContainerSSH logs

"The backend has rejected the user after successful authentication. (failed to create container, giving up (Cannot connect to the Docker daemon at tcp://sacrificial-vm:2376. Is the docker daemon running?))"

Investigation

Possible reason: too low default nofile (#open files limit) for dockerd. Details: https://github.com/paseaf/ContainerSSH-honeypot/issues/51#issuecomment-1193271988

Possible solution: start dockerd with a higher nofile limit Details: https://github.com/paseaf/ContainerSSH-honeypot/issues/51#issuecomment-1193279149

TODOs

paseaf commented 2 years ago

Possible reason: too low default `ulimit` value.

### Why we usually don't see the issue: `dockerd` by default starts with `systemd`. - `systemd` has a config file for `dockerd`. - The config file sets the limit `NOFILE` to `infinity` Config file: ```bash systemctl cat docker.service ``` returns ```bash # /lib/systemd/system/docker.service [Unit] # ... [Service] # ... LimitNOFILE=infinity LimitNPROC=infinity # HERE LimitCORE=infinity # ... ``` Actual limits:

root@gateway-vm:~# cat /proc/$(pidof dockerd)/limits
Limit                     Soft Limit           Hard Limit           Units
Max cpu time              unlimited            unlimited            seconds
Max file size             unlimited            unlimited            bytes
Max data size             unlimited            unlimited            bytes
Max stack size            8388608              unlimited            bytes
Max core file size        unlimited            unlimited            bytes
Max resident set          unlimited            unlimited            bytes
Max processes             unlimited            unlimited            processes
Max open files            1048576              1048576              files
Max locked memory         65536                65536                bytes
Max address space         unlimited            unlimited            bytes
Max file locks            unlimited            unlimited            locks
Max pending signals       15657                15657                signals
Max msgqueue size         819200               819200               bytes
Max nice priority         0                    0
Max realtime priority     0                    0
Max realtime timeout      unlimited            unlimited            us
### Why we see the issue now Because we started `dockerd` without giving an configuration file, `dockerd` then started with default soft limit. (1024) #### How we started dockerd ``` killall dockerd sudo dockerd \ -H unix:///var/run/docker.sock \ --tlsverify \ --tlscacert=ca.pem \ --tlscert=server-cert.pem \ --tlskey=server-key.pem \ -H=0.0.0.0:2376 ``` (source: https://github.com/paseaf/ContainerSSH-honeypot/blob/main/terraform/scripts/restart_dockerd_with_tls.sh #### System limits:

root@sacrificial-vm:/etc/docker# cat /proc/$(pidof dockerd)/limits
Limit                     Soft Limit           Hard Limit           Units
Max cpu time              unlimited            unlimited            seconds
Max file size             unlimited            unlimited            bytes
Max data size             unlimited            unlimited            bytes
Max stack size            8388608              unlimited            bytes
Max core file size        0                    0                    bytes
Max resident set          unlimited            unlimited            bytes
Max processes             15657                15657                processes
Max open files            1024                 1048576              files
Max locked memory         514113536            514113536            bytes
Max address space         unlimited            unlimited            bytes
Max file locks            unlimited            unlimited            locks
Max pending signals       15657                15657                signals
Max msgqueue size         819200               819200               bytes
Max nice priority         0                    0
Max realtime priority     0                    0
Max realtime timeout      unlimited            unlimited            us
paseaf commented 2 years ago

Possible solutions to start `dockerd` with a higher `ulimit`

Possible Solution ### Option 1: Change `systemd` config (What we used) 1. Find config file location ``` $ systemctl cat docker # /lib/systemd/system/docker.service ``` 2. Edit config file at `/lib/systemd/system/docker.service`: ```ini ExecStart=/usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock \ --tlsverify --tlscacert=/home/deployer/ca.pem \ --tlskey=/home/deployer/server-key.pem \ --tlscert=/home/deployer/server-cert.pem \ -H=0.0.0.0:2376 ```
Complete `docker.service` file ```ini [Unit] Description=Docker Application Container Engine Documentation=https://docs.docker.com After=network-online.target docker.socket firewalld.service containerd.service Wants=network-online.target Requires=docker.socket containerd.service [Service] Type=notify # the default is not to use systemd for cgroups because the delegate issues still # exists and systemd currently does not support the cgroup feature set required # for containers run by docker ExecStart=/usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock \ --tlsverify --tlscacert=/home/deployer/ca.pem \ --tlskey=/home/deployer/server-key.pem \ --tlscert=/home/deployer/server-cert.pem \ -H=0.0.0.0:2376 ExecReload=/bin/kill -s HUP $MAINPID TimeoutSec=0 RestartSec=2 Restart=always # Note that StartLimit* options were moved from "Service" to "Unit" in systemd 229. # Both the old, and new location are accepted by systemd 229 and up, so using the old location # to make them work for either version of systemd. StartLimitBurst=3 # Note that StartLimitInterval was renamed to StartLimitIntervalSec in systemd 230. # Both the old, and new name are accepted by systemd 230 and up, so using the old name to make # this option work for either version of systemd. StartLimitInterval=60s # Having non-zero Limit*s causes performance problems due to accounting overhead # in the kernel. We recommend using cgroups to do container-local accounting. LimitNOFILE=infinity LimitNPROC=infinity LimitCORE=infinity # Comment TasksMax if your systemd version does not support it. # Only systemd 226 and above support this option. TasksMax=infinity # set delegate yes so that systemd does not reset the cgroups of docker containers Delegate=yes # kill only the docker process, not all processes in the cgroup KillMode=process OOMScoreAdjust=-500 [Install] WantedBy=multi-user.target ```
3. Restart dockerd ``` systemctl daemon-reload systemctl restart docker ``` 4. Verify SSH honeypot, and open files limit

   root@sacrificial-vm:~# cat /proc/$(pidof dockerd)/limits
   Limit                     Soft Limit           Hard Limit           Units
   Max cpu time              unlimited            unlimited            seconds
   Max file size             unlimited            unlimited            bytes
   Max data size             unlimited            unlimited            bytes
   Max stack size            8388608              unlimited            bytes
   Max core file size        unlimited            unlimited            bytes
   Max resident set          unlimited            unlimited            bytes
   Max processes             unlimited            unlimited            processes
   Max open files            1048576              1048576              files
   Max locked memory         65536                65536                bytes
   Max address space         unlimited            unlimited            bytes
   Max file locks            unlimited            unlimited            locks
   Max pending signals       15657                15657                signals
   Max msgqueue size         819200               819200               bytes
   Max nice priority         0                    0
   Max realtime priority     0                    0
   Max realtime timeout      unlimited            unlimited            us
   
### Option 2: Add a configuration file to `/etc/docker/daemon.json` as suggested [here](https://docs.docker.com/config/daemon/systemd/#custom-docker-daemon-options) > The recommended way is to use the platform-independent daemon.json file, which is located in /etc/docker/ on Linux by default. We didn't use it because option 1 was easier to configure.
paseaf commented 2 years ago

Problem is gone after having a larger nofile. https://github.com/paseaf/ContainerSSH-honeypot/issues/51#issuecomment-1193279149.

We now have 29 guest containers running at the same time. image

SSH into the honeypot also works