matrix-org / synapse

Synapse: Matrix homeserver written in Python/Twisted.
https://matrix-org.github.io/synapse
Apache License 2.0
11.79k stars 2.13k forks source link

Dæmonised synapse does not close file descriptors #9099

Open andrewshadura opened 3 years ago

andrewshadura commented 3 years ago

Axel Beckert (@xtaran) writes in a bug report about Synapse hanging on start:

 9500 root       20   0  7684  2608  1352 S  0.0  0.3 25:57.98 ├─ SCREEN -RdU
 9501 root       20   0  7264  2512  1772 S  0.0  0.3  0:00.65 │  ├─ bash
 2658 root       20   0  178M  125M 49080 S  0.0 13.6  1:24.26 │  │  └─ aptitude -u
 4501 root       20   0 12616  7496  1684 S  0.0  0.8  0:00.40 │  │     ├─ dpkg --status-fd 66 --configure --pending
 4970 root       20   0 19836 13500  3604 S  0.0  1.4  0:01.32 │  │     │  ├─ perl -w /usr/share/debconf/frontend /var/lib/dpkg/info/matrix-synapse.postinst configure
 4502 root       20   0  1468   380   332 S  0.0  0.0  0:00.00 │  │     │  └─ sh -c (test -x /usr/lib/needrestart/dpkg-status && /usr/lib/needrestart/dpkg-status || cat > /dev/null)
 4503 root       20   0  1468    68     0 S  0.0  0.0  0:00.00 │  │     │     └─ sh -c (test -x /usr/lib/needrestart/dpkg-status && /usr/lib/needrestart/dpkg-status || cat > /dev/null)
 4504 root       20   0  1468   380   332 S  0.0  0.0  0:00.02 │  │     │        └─ sh /usr/lib/needrestart/dpkg-status
 3499 root       20   0  178M  125M 49080 S  0.0 13.6  0:00.00 │  │     └─ aptitude -u
[...]
 5066 matrix-sy  20   0 69700 46988  6964 S  0.0  5.0  0:19.14 ├─ python3 -m synapse.app.homeserver --config-path /etc/matrix-synapse/homeserver.yaml --config-path /etc/matrix-synapse/conf.d/ --daemonize
 5067 matrix-sy  20   0 69700 46988  6964 S  0.0  5.0  0:02.13 │  └─ python3 -m synapse.app.homeserver --config-path /etc/matrix-synapse/homeserver.yaml --config-path /etc/matrix-synapse/conf.d/ --daemonize

So synapse has already forked, the init script seems to have exited already and even the postinst script has exited already. The question is what causes debconf to wait?

<…>

I suspect — based on the name part "matrix-synapse." that this zombie was either the matrix-synapse.postinst script or the matrix-synapse.config script.

Looking with lsof what's still opened by the debconf frontend process (pid 4970), I noticed these two pipes:

<…>

Grepping through the whole process space, I noticed that at least one of these pipes is connected to the python3 process running matrix-synapse which looks odd to me:

<…>

So it seems that matrix-synapse's --daemonize doesn't properly close file descriptors upon forking — which is likely an upstream issue.

richvdh commented 3 years ago

Thanks. The code in question is https://github.com/matrix-org/synapse/blob/master/synapse/util/daemonize.py#L89-L93. As you can see it closes FDs 0-2. Is some script running synapse with another FD open? If so, why, and can it be made to not do so?

andrewshadura commented 3 years ago

I don’t think the code in question closes them; on the contrary, it seem to be reopening /dev/null to be available as FDs 0—2.

richvdh commented 3 years ago

sorry, yes, but the act of doing so means that whatever was previously on FDs 0-2 will be closed.

From dup(2):

If the file descriptor newfd was previously open, it is silently closed before being reused.

andrewshadura commented 3 years ago

FYI, I found this: https://github.com/thesharp/daemonize/blob/master/daemonize.py#L136

richvdh commented 3 years ago

yeah, it's not hard to close the FDs if required. I'm just curious as to the reason it's needed.

we used to use that daemonize library, but it has various bugs and is somewhat overcomplex. We never used that particular part of it, though the reasons are lost in the mists of time (https://github.com/matrix-org/synapse/commit/4f475c7697722e946e39e42f38f3dd03a95d8765#diff-6e21d6a61b2f6b6f2d4ce961991ba7f27e83605f927eaa4b19d2c46a975a96c1R161)

andrewshadura commented 3 years ago

Well, in this particular case, those FDs happen to be around from debconf which ran in the same postinst script which later starts Synapse — and since it’s all non-systemd, at no point do extra FDs get closed, since that’s something synapse is I understand expected to do.

This does not, as I understand it, happen after synapse has been installed, only during the initial installation and package upgrades.

andrewshadura commented 3 years ago

FWIW I have worked it around by not using --daemonize as Alex suggested.

richvdh commented 3 years ago

Well, in this particular case, those FDs happen to be around from debconf

so debconf opens extra FDs, and then runs synapse with the extra FDs? Or debconf is somehow changing the FDs in its parent, which are then inherited when that parent later starts synapse?

andrewshadura commented 3 years ago

Debconf consists of a dæmon doing the UI and a shell script (sourced into postinst) talking to it. So Synapse inherits the open FDs from the shell script.

richvdh commented 3 years ago

@andrewshadura do you have a link to the postinst script, ooi?

richvdh commented 3 years ago

right, the postinst in question is https://salsa.debian.org/matrix-team/matrix-synapse/-/blob/debian/unstable/debian/postinst; the . /usr/share/debconf/confmodule at the start opens an FD 3, and the #DEBHELPER# at the end of it is expanded into commands which include invoke-rc.d --skip-systemd-native matrix-synapse start - so that's where the extra FD is coming from.

We also have the same situation in the matrix-synapse-py3 packages distributed by matrix.org.

As debconf-devel(7) says:

  • If your postinst launches a daemon, make sure you tell debconf to STOP at the end, since debconf can become a little confused about when your postinst is done otherwise.

so arguably the right fix is to add a db_stop before the #DEBHELPER#.

That said, I think it's probably correct that synapse should close all its open FDs when it is daemonized, and that will solve it once and for all.