xiph / Icecast-Server

Icecast streaming media server (Mirror) - Please report bugs at https://gitlab.xiph.org/xiph/icecast-server/issues
https://icecast.org
GNU General Public License v2.0
465 stars 127 forks source link

Fix: Postpone signal handling to avoid deadlocks #64

Closed Jookia closed 1 year ago

Jookia commented 1 year ago

Currently the signal handler will do two things:

Unfortunately these both require locks (though _sig_die doesn't currently lock global variables, this is fixed too), and as signal handlers can run at any time it's possible to deadlock the server.

Solve this entirely by moving signal handling code out of signal handlers and having the slave handle logging and flagging variables.

This new code changes semantics slightly: Multiple signals are treated as one signal. In the case of telling the server to reload or shutdown this isn't as important as avoiding deadlocking.


Note: I've posted this here as the GitLab instance doesn't support merge requests. Have fun.

Jookia commented 1 year ago

I posted this patch and debugging details to https://gitlab.xiph.org/xiph/icecast-server/-/issues/2472 which is the correct place.

Jookia commented 2 months ago

For anyone interested, the upstream devel branch has rejected my change and instead pushed a broken fix for this. It makes the following mistakes:

I discussed these problems with them at the time, but it seems they disagree that these are problems. I tried to find the long thread where we discussed this but a quick search came up empty. Maybe it was removed or pruned? Who knows. The issue is still open.

I have posted a branch that reverts and re-instates this fix here: https://github.com/Jookia/Icecast-Server/tree/devel-phschafft-signals

If you get value out of this, please contact the developers and ask them to fix their code. Maybe they'll listen to you!

petterreinholdtsen commented 2 months ago

[Jookia]

For anyone interested, the upstream devel branch has rejected my change and instead pushed a broken fix for this. It makes the following mistakes:

  • It doesn't lock the global data before accessing it
  • It accesses data that isn't even marked sig_atomic_t
  • It puts a sig_atomic_t in a struct, which seems fishy to me

If I understand your argument, the lock around 'global' would be needed to handle code replacing the global struct at runtime. Will this ever happen?

The data in global being access seem to be 'int' or sig_atomic_t (if available on platform). I guess this might be a problem if the program is multithreaded, allowing the signal handler and the main program to run on different threads / cores, where one core set it to zero and another try to set it to something else in parallell. Is the program multithreaded?

Not quite sure what is fishy aout having sig_atom_t in a struct. Can you elaborate?

I discussed these problems with them at the time, but it seems they disagree that these are problems. I tried to find the long thread where we discussed this but a quick search came up empty. Maybe it was removed or pruned? Who knows. The issue is still open.

Perhaps the discussion was attached to a commit or code fragment? Those threads are notoriasly hard to track down. Perhaps you got an old email with links?

I have posted a branch that reverts and re-instates this fix here: https://github.com/Jookia/Icecast-Server/tree/devel-phschafft-signals

It is a bit hard to track down exactly what you changed. Perhaps rebase on top of the current master and squash the commit into one single change?

-- Happy hacking Petter Reinholdtsen

Jookia commented 2 months ago

Hi,

On Tue, Apr 30, 2024 at 02:22:04AM -0700, petterreinholdtsen wrote:

If I understand your argument, the lock around 'global' would be needed to handle code replacing the global struct at runtime. Will this ever happen?

The data in global being access seem to be 'int' or sig_atomic_t (if available on platform). I guess this might be a problem if the program is multithreaded, allowing the signal handler and the main program to run on different threads / cores, where one core set it to zero and another try to set it to something else in parallell. Is the program multithreaded?

Yes, this is the case for this program: It runs with multiple threads and throughout the code uses a lock to access global everywhere else, just not the signal handler.

Not quite sure what is fishy aout having sig_atom_t in a struct. Can you elaborate?

The only type you allowed to access in a signal handler is a sig_atomic_t. The devel code access the struct global AND the sig_atomic_t.

This is all moot anyway because you still need to lock the globals struct before accessing it.

Perhaps the discussion was attached to a commit or code fragment? Those threads are notoriasly hard to track down. Perhaps you got an old email with links?

Possibly. I don't think it's worth digging up, but the disagreement was over what the code says to do versus what it does:

If you write have this code:

struct global_data {
    int my_data;
    int other_data;
}
global_data globals;

Then run globals.my_data = 1234 in a signal handler, what happens?

The C standard says something along the lines of accessing two pieces of data: the globals structure, and the my_data structure. How this access happens, how things are laid out in memory, this is up to the compiler. The C standard provides a set of specific instructions on how to write code in a signal handler that will run correctly on any platform. This is the perspective my code comes from.

On the other hand, there is the Icecast developer perspective: If you integer assume writes are atomic, globals doesn't move in memory, and you don't need instant updates across threads, you can just set the variable. In the devel code, the worst that happens is the configuration gets re-read twice. So the signal handler will probably be fine.

That said, the C standard is just a standard. It doesn't actually do anything. What matters are the compilers, and maybe the developer just doesn't want to change the code too much in case it breaks weird platforms. Who knows.

I have posted a branch that reverts and re-instates this fix here: https://github.com/Jookia/Icecast-Server/tree/devel-phschafft-signals

It is a bit hard to track down exactly what you changed. Perhaps rebase on top of the current master and squash the commit into one single change?

This is on top of the latest develop branch by the Icecast author here:

https://gitlab.xiph.org/xiph/icecast-server/-/tree/devel-phschafft

If you look at my branch you can tell which commits I've changed as they're just the reverts of the dev's changes, re-application of this patch and then me fixing this patch by marking the sig_atomic_t as volatile.

-- Happy hacking Petter Reinholdtsen

Jookia.