teleclimber / Dropserver

An application platform for your personal web services. https://dropserver.org
Apache License 2.0
42 stars 1 forks source link

sandbox with bubblewrap creates inotify watches and doesn't remove them #113

Closed teleclimber closed 3 months ago

teleclimber commented 1 year ago

Unsure what's causing this but I am experiencing a glitch that causes all appspaces to fail to start their sandboxes.

I get 503 and 504 errors on requests that hit the appspace sandbox. (Other requests get through fine).

ds-host responds fine.

What I'm seeing in the Dropserver status is intriguing:

     CGroup: /system.slice/dropserver.service
             ├─host
             │ ├─   852 /usr/local/bin/ds-host -config=/etc/dropserver.json
             │ ├─464374 /deno run --unstable --no-check --import-map=/import-paths.json --allow-read=/sockets/,/app-fil>
             │ └─464396 /deno run --unstable --no-check --import-map=/import-paths.json --allow-read=/sockets/,/app-fil>
             └─sandboxes
               ├─sandbox-1025
               │ └─464373 bwrap --clearenv --setenv DENO_DIR /deno-dir/ --setenv NO_COLOR true --unshare-user-try --uns>
               ├─sandbox-1026
               │ └─464395 bwrap --clearenv --setenv DENO_DIR /deno-dir/ --setenv NO_COLOR true --unshare-user-try --uns>
               └─sandbox-1027
                 ├─464400 bwrap --clearenv --setenv DENO_DIR /deno-dir/ --setenv NO_COLOR true --unshare-user-try --uns>
                 └─464401 /deno run --unstable --no-check --import-map=/import-paths.json --allow-read=/sockets/,/app-f>

Definitely weird that two deno processses are sitting inside host cgroup and were not moved to their own cg.

The logs point to a similar area. I see these errors:

sandbox/bwrapjsonstatus_linux.go:45 (BwrapJsonStatus, b.follow()) Error: bad file descriptor

I see this error three times.

Did I run out of inotify? Does not seem like it.

Running systemctl kill dropserver took a long time. The log shows that there was a very large number of processes being killed. So may be taht explains something? Some process are not shutting down correctly. (bwrap?) Investigate more.

teleclimber commented 6 months ago

Convinced that this is due to not removing watches on bwrap json status after we're done with it. This has happened a second time and I ran inotify-info and I get (after a reboot):

[dropserver@localhost inotify-info]$ inotify-info ds-host
------------------------------------------------------------------------------
INotify Limits:
  max_queued_events    16,384
  max_user_instances   1,024
  max_user_watches     524,288
------------------------------------------------------------------------------
       Pid Uid        App         Watches  Instances
       269 1000       ds-host           1          1
[8.0]: 277540
------------------------------------------------------------------------------
Total inotify Watches:   1
Total inotify Instances: 1
------------------------------------------------------------------------------

Searching '/' for listed inodes... (1 threads)
   277540 [8:0] /home/dropserver/ds-deno-sockets/bwrap-json-status-3923834122

8,047 dirs scanned (0.13 seconds)

So I need to add unwatch functionality.