100% CPU usage without further logs or ports opened

agross commented 2 years ago

Describe the bug

I have a server where postsrsd runs as part of docker-mailserver. On this instance, the main postsrsd takes 100% of the CPU cycles and logs nothing, even when being started manually on the command line (without -D). None of the ports (10001, 10002) are opened, too.

Relevant log output

Nothing.

System Configuration

OS: docker
Mailer daemon: Postfix
Version: 1.10

polarathene commented 1 year ago

I also experienced this (see issue referencing this one above).

It's unclear what is causing that behaviour, but after close to 10 minutes (on a 2vCPU VM guest), postsrsd did actually initialize properly and bind the ports. Mails could then be processed without errors from then on.

polarathene commented 1 year ago

TL;DR: (I've collapsed the original content to focus on where the problem is)

This part of the main() method takes approx 10 minutes to iterate through a billion close() calls, and is most likely to be encountered via Docker containers running as the root user.
The most convenient workaround for users in the meantime is probably using --ulimit option on a container.
Suggested fix for postsrsd is to iterate through /proc/self/fd instead.

Earlier information / investigation

Additional information: The `docker-mailserver` project runs a Debian Bullseye (11) base image docker container with the `postsrsd` package installed (`v1.10`), and I reproduced this in a VM guest running Fedora (as the referenced issue mentions). Ignoring any init scripts from the installed package and running the binary directly with `/usr/sbin/postsrsd -d example.test` reproduces the failure. --- If I do not provide the command the `-d` option, it will fail. This is expected to be the same for `-s` being absent too (according to `-h` output), but it's stalling. I don't see any reason for it to believe it had been provided a secret otherwise? I assume that means that the command does not reach this point?: https://github.com/roehling/postsrsd/blob/6e701fa51f26bb344bc0230cdfb13ae1e14afb8d/postsrsd.c#L518-L533 but the command does reach the earlier `-d` check: https://github.com/roehling/postsrsd/blob/6e701fa51f26bb344bc0230cdfb13ae1e14afb8d/postsrsd.c#L483-L487 That would mean something in-between is likely where the command is stalling? (_**Confirmed:** after the long delay, the `-s` check fails the command_): https://github.com/roehling/postsrsd/blob/6e701fa51f26bb344bc0230cdfb13ae1e14afb8d/postsrsd.c#L489-L517

The most likely culprit then would perhaps be:

https://github.com/roehling/postsrsd/blob/6e701fa51f26bb344bc0230cdfb13ae1e14afb8d/postsrsd.c#L500-L506

# Docker container (Debian 11 Bullseye base image)
$ getconf -a | grep OPEN_MAX

OPEN_MAX                           1073741816
_POSIX_OPEN_MAX                    1073741816

# VM guest Fedora 36 (Docker host)
$ getconf -a | grep OPEN_MAX

OPEN_MAX                           1024
_POSIX_OPEN_MAX                    1024

# NOTE: `ulimit -n` and `sysctl fs.nr_open` also outputs the same value

So the for loop is doing close() 1 billion times?

Confirmation of issue and resolution - various workarounds

**UPDATE:** Yes, this seems to be the problem. Others have [experienced this issue before](https://github.com/jeroen/sys/issues/34#issuecomment-662521074) with Docker, noting that it sets this massively larger value for the container, but only for the root user (fairly common). I have confirmed that `su docker -c ''` (_we have an unprivileged user named `docker`_) works (_both for outputting `OPEN_MAX=1024` and of course `postsrsd` being effectively instant_). **UPDATE 2 (alternative workarounds):** ~~You can still kind of run a command as root with reduced FD limit with `ulimit -n 2048 && ` (limit set to 2048), but it fails to allow you to use `ulimit` like that again in the container (even as root) with _"bash: ulimit: open files: cannot modify limit: Operation not permitted"_. `ulimit -n` and `getconf -a` commands will both show the reduced limit, while `sysctl fs.nr_open` remains unchanged.~~ (doesn't seem like good advice, but was [someones solution for using with `Dockerfile` builds](https://github.com/PowerDNS/pdns-builder/pull/39#issuecomment-550729353)) Docker containers can use a [**`--ulimit` option**](https://docs.docker.com/engine/reference/commandline/run/#set-ulimits-in-container---ulimit) for per container limits `docker run --rm -it --ulimit 'nofile=1024' alpine ash -c 'ulimit -n'`. **Works well as a workaround** in the meantime. There's also a [Docker daemon config](https://github.com/moby/moby/issues/38814) approach when viable, that would enforce that limit across all containers. That's the official upstream Docker issue AFAIK regarding the problems with software hitting these perf issues in docker containers.

Suggested Fix

Original suggestion

The issue I referenced of another users experience with the problem also mentioned a fix that sounds reasonable? I am not familiar with the reason of the logic in your code, but that users similar code made this change (_with a slightly more helpful comment about the purpose_): ```c //close all file descriptors before exit, otherwise they can segfault for (int i = 3; i < sysconf(_SC_OPEN_MAX); i++) { if(i != failure[w]){ int err = close(i); if(i > 200 && err < 0) break; } } ``` They iterate the first 200 FD (_I'm familiar with FD 200 being common with `flock()` examples_), and then continue until `close()` has an error. I assume that means an FD of 202 with no 201 (should error with `close()`?) would mean that FD 202 would not get closed. You probably know better how problematic that is. If that's not a viable solution, perhaps adding to the README (and maybe `-h` output) that Docker containers running as the root user will have this problem (and link to this issue for more details). Additionally consider an option that allows setting the `maxfd` limit (_although as a user, I don't know what scenarios with `postsrsd` would lead to the failure it's trying to prevent_).

Other alternatives I saw:

Iterating through /proc/self/fd was found to be faster than anything with a limit higher than 60 and should ensure you have access to all relevant fd? (_Related RedHat bug report for the rpm project that chose this solution_)
Apparently there's a close_range() method you could use instead of many separate close() calls.

roehling commented 1 year ago

Thank you for that excellent investigation. The loop you found has been added by #65; to be honest, I always found this a bit iffy, but I failed to realize that the file descriptor limit can be this insanely high.

The file descriptors are assigned by the kernel in a somewhat ascending order, so it's unlikely to hit a FD greater than 200 unless 200 files have been opened by whatever process spawns PostSRSd.

And while I was writing this, I saw you added close_range(). I did not know about that function yet, but it seems to be the best alternative. The manual page even has close_range(3, ~0U, ...) as a use-case.

polarathene commented 1 year ago

The file descriptors are assigned by the kernel in a somewhat ascending order, so it's unlikely to hit a FD greater than 200 unless 200 files have been opened by whatever process spawns PostSRSd.

I was of the understanding that you could specify an arbitrary FD number for example:

(
flock -s 200

# ... commands executed under lock ...

) 200 < /tmp/config-file

Is that not FD 200? I am not that knowledgeable in this area, so I could be misunderstanding.

And while I was writing this, I saw you added close_range().

Done with my editing :sweat_smile:

Whatever makes most sense to you is fine by me :+1:

I was just confused why a test we run in our CI was working fine but was having issues with postsrsd when I was running tests on our container locally. I assume Github configures the Docker daemon to have more sane limits.

Documented here for the benefit of others who stumble upon it :)

I always found this a bit iffy, but I failed to realize that the file descriptor limit can be this insanely high.

From what I've read, Docker / containerd needs this to do it's thing across many containers, but the containers themselves don't. I was surprised at the staggering difference myself :smile:

roehling commented 1 year ago

close_range() seems to be relatively new (I have it on my Debian unstable, but not my Ubuntu 20.04), but it is so nice that I decided to use it anyway and add some fallback code for older systems.

polarathene commented 1 year ago

Awesome thanks for the quick fix! :heart:

roehling commented 1 year ago

I was of the understanding that you could specify an arbitrary FD number for example:
(
flock -s 200

# ... commands executed under lock ...

) 200 < /tmp/config-file
Is that not FD 200? I am not that knowledgeable in this area, so I could be misunderstanding.

Sure, you can do that in the shell, in regular programs with open() calls, file descriptors typically won't be assigned randomly. Besides, it's not like any file descriptors have specific semantics besides the first 3 (stdin, stdout, stderr); I suspect the idea with 200 was to go high so you you don't conflict with existing open files, which ended up as cargo cult.

Also, the general rule is, you open it, you close it, so I'm just being nice with closing all the inherited FDs, and it got me a bug into the code as a reward...

roehling / postsrsd

100% CPU usage without further logs or ports opened #122

Suggested Fix