Closed agross closed 1 year ago
I also experienced this (see issue referencing this one above).
It's unclear what is causing that behaviour, but after close to 10 minutes (on a 2vCPU VM guest), postsrsd
did actually initialize properly and bind the ports. Mails could then be processed without errors from then on.
TL;DR: (I've collapsed the original content to focus on where the problem is)
main()
method takes approx 10 minutes to iterate through a billion close()
calls, and is most likely to be encountered via Docker containers running as the root user.--ulimit
option on a container.postsrsd
is to iterate through /proc/self/fd
instead.The most likely culprit then would perhaps be:
# Docker container (Debian 11 Bullseye base image)
$ getconf -a | grep OPEN_MAX
OPEN_MAX 1073741816
_POSIX_OPEN_MAX 1073741816
# VM guest Fedora 36 (Docker host)
$ getconf -a | grep OPEN_MAX
OPEN_MAX 1024
_POSIX_OPEN_MAX 1024
# NOTE: `ulimit -n` and `sysctl fs.nr_open` also outputs the same value
So the for loop is doing close()
1 billion times?
Other alternatives I saw:
/proc/self/fd
was found to be faster than anything with a limit higher than 60 and should ensure you have access to all relevant fd? (_Related RedHat bug report for the rpm
project that chose this solution_)close_range()
method you could use instead of many separate close()
calls.Thank you for that excellent investigation. The loop you found has been added by #65; to be honest, I always found this a bit iffy, but I failed to realize that the file descriptor limit can be this insanely high.
The file descriptors are assigned by the kernel in a somewhat ascending order, so it's unlikely to hit a FD greater than 200 unless 200 files have been opened by whatever process spawns PostSRSd.
And while I was writing this, I saw you added close_range()
. I did not know about that function yet, but it seems to be the best alternative. The manual page even has close_range(3, ~0U, ...)
as a use-case.
The file descriptors are assigned by the kernel in a somewhat ascending order, so it's unlikely to hit a FD greater than 200 unless 200 files have been opened by whatever process spawns PostSRSd.
I was of the understanding that you could specify an arbitrary FD number for example:
(
flock -s 200
# ... commands executed under lock ...
) 200 < /tmp/config-file
Is that not FD 200? I am not that knowledgeable in this area, so I could be misunderstanding.
And while I was writing this, I saw you added
close_range()
.
Done with my editing :sweat_smile:
Whatever makes most sense to you is fine by me :+1:
I was just confused why a test we run in our CI was working fine but was having issues with postsrsd
when I was running tests on our container locally. I assume Github configures the Docker daemon to have more sane limits.
Documented here for the benefit of others who stumble upon it :)
I always found this a bit iffy, but I failed to realize that the file descriptor limit can be this insanely high.
From what I've read, Docker / containerd
needs this to do it's thing across many containers, but the containers themselves don't. I was surprised at the staggering difference myself :smile:
close_range()
seems to be relatively new (I have it on my Debian unstable, but not my Ubuntu 20.04), but it is so nice that I decided to use it anyway and add some fallback code for older systems.
Awesome thanks for the quick fix! :heart:
I was of the understanding that you could specify an arbitrary FD number for example:
( flock -s 200 # ... commands executed under lock ... ) 200 < /tmp/config-file
Is that not FD 200? I am not that knowledgeable in this area, so I could be misunderstanding.
Sure, you can do that in the shell, in regular programs with open()
calls, file descriptors typically won't be assigned randomly.
Besides, it's not like any file descriptors have specific semantics besides the first 3 (stdin, stdout, stderr); I suspect the idea with 200 was to go high so you you don't conflict with existing open files, which ended up as cargo cult.
Also, the general rule is, you open it, you close it, so I'm just being nice with closing all the inherited FDs, and it got me a bug into the code as a reward...
Describe the bug
I have a server where postsrsd runs as part of docker-mailserver. On this instance, the main postsrsd takes 100% of the CPU cycles and logs nothing, even when being started manually on the command line (without
-D
). None of the ports (10001, 10002) are opened, too.Relevant log output
Nothing.
System Configuration