irontec / sngrep

Ncurses SIP Messages flow viewer
GNU General Public License v3.0
1k stars 189 forks source link

sngrep should not use SIGHUP for dump file rotation #458

Closed chschnell closed 10 months ago

chschnell commented 10 months ago

When I run sngrep through a SSH console and close that console without closing sngrep first, sngrep keeps running with 100% load (probably because stdio is broken at this point, which should be another reason to exit).

It seems SIGHUP is ignored by sngrep. I ran sngrep in one SSH console, opened another console and tried:

# no effect:
sudo kill -SIGHUP <sngrep-pid>

# works: closes sngrep by killing it
sudo kill -SIGKILL <sngrep-pid>

The problem is, I had a SSH session with sngrep running in a minimized window and had forgotten about it. At night my DSL connection was reset by my provider, killing that SSH session. Next morning I noticed a hightened load on the server and that is how I found out about this bug.

Expected behaviour for POSIX signal SIGHUP from Wikipedia:

The SIGHUP signal is sent to a process when its controlling terminal is closed. It was originally designed to notify the process of a serial line drop (a hangup). In modern systems, this signal usually means that the controlling pseudo or virtual terminal has been closed. Many daemons (who have no controlling terminal) interpret receipt of this signal as a request to reload their configuration files and flush/reopen their logfiles instead of exiting. nohup is a command to make a command ignore the signal.

chschnell commented 10 months ago

Closing, I just noticed that this issue was fixed in e1d2b32. I was using sngrep 1.6.0 from Debian bullseye, now building from source.

chschnell commented 10 months ago

As it turned out, commit e1d2b32 did not fix this issue for me, but I believe to have found the root cause of the problem.

In commit a2b88f9 from May 31, 2022, SIGHUP was introduced in src/capture.c as the signal to trigger pcap dump file rotation. The signal is caught and its handling ends there, causing sngrep to keep running after the (SSH) terminal session ends.

What fixed the issue for me was to replace SIGHUP with SIGUSR1 in src/capture.c. This simply lets SIGHUP propagate normally and sngrep is exiting reliably when I kill the terminal session in which it was executed.

SIGHUP is often used in daemons for log file rotation, but not in applications which are bound to a tty. The proposed fix is very simple, I can submit a pull request if you like, however I prefer to discuss it here first because:

For reference, here's the expected behaviour for POSIX signals SIGCONT and SIGUSR1 from Wikipedia:

SIGCONT
The SIGCONT signal instructs the operating system to continue (restart) a process previously paused by the SIGSTOP or SIGTSTP signal. One important use of this signal is in job control in the Unix shell.

[...]

SIGUSR1 and SIGUSR2
The SIGUSR1 and SIGUSR2 signals are sent to a process to indicate user-defined conditions.

Wikipedia also has some more in-depth information about SIGHUP, see SIGHUP: Modern usage.

chschnell commented 10 months ago
  • As far as I'm aware, letting SIGHUP terminate sngrep directly does not do any of the cleanup in main(), this might be a cause of leakage I'm unaware of.

This is resolved by extending the proposed fix to:

  1. replace SIGHUP with SIGUSR1 in src/capture.c, and
  2. replace SIGCONT with SIGHUP in src/util.c.

When sngrep receives a SIGHUP signal it now exits properly, meaning when was_sigterm_received() returns true. I tested it by creating a file at the very end of main(), and indeed this file was created when I killed the SSH terminal.

chschnell commented 10 months ago

Of course, the downside of this fix is that it would break compatibility with existing code that uses SIGHUP to trigger pcap dump file rotations.

One way could be to just document the change properly, another by adding a new option in /etc/sngreprc to explicitly enable this new behaviour, for example:

set capture.rotatesignal (off | HUP | USR1)

with a default of HUP in order to stay backwards-compatible.

Kaian commented 10 months ago

Hi!

The SIGHUP code comes from a PR https://github.com/irontec/sngrep/pull/401

The SIGCONT implementation for SSH terminal disconnection was based on experimentation (displaying what signal was being received), but maybe the previous PR was handling SIGHUP and masking it from the logs.

Most probably the SIGHUP rotation feature is only actually being used by @g-v-egidy and doesn't mind to change the signal with USR1 to avoid extra setting in configuration. If he agrees, I don't mind breaking some backwards compatibility with this feature.

Thanks for the contribution!

Regards

g-v-egidy commented 10 months ago

Thanks for notifying me about this issue.

Back when I implemented #401 I used SIGHUP because it is the common signal for reloading the log file in daemons. I wasn't aware of the issues this could cause for a process with a controlling terminal. While I did some tests sending SIGHUP to sngrep in interactive mode, I didn't try closing the terminal.

I don't mind changing this to SIGUSR1 to fix this.

But since the use of SIGUSR1 is not the default for log rotation of daemons, this should probably be documented somewhere.

chschnell commented 10 months ago

I don't mind changing this to SIGUSR1 to fix this.

Glad to hear that, and thank you for your feedback.

But since the use of SIGUSR1 is not the default for log rotation of daemons, this should probably be documented somewhere.

My pull request includes a note on SIGUSR1 in the man page (note that SIGHUP was never documented). The release notes should clearly mention this change as breaking compatibility with earlier versions of sngrep.

A hint about the recommended way to run sngrep in the background could be added, I think this should do it:

# Start sngrep in the background
# - nohup detaches sngrep from its process group, and its stdio from the tty
# - redirects output of nohup to /dev/null
# - this syntax is neccessary to make sudo and nohup play together
#
sudo sh -c 'nohup sngrep -qNO /var/log/sngrep.pcap > /dev/null &'

By the way, in logrotate the (SIG)HUP signal is fired in the postrotate config section, see the config file examples. Only -HUP would need to be replaced with -USR1.

g-v-egidy commented 10 months ago

A hint about the recommended way to run sngrep in the background could be added, I think this should do it:

$ sudo nohup sngrep --no-interface --quiet -O output.pcap > /dev/null &

hmm, when talking about running it in the background I think the user will very often have some kind of regular & reliable use in mind. So why not directly suggest a systemd unit:

[Unit]
Description=pcap tracing for SIP traffic
After=network-online.target

[Service]
Type=simple
ExecStart=/usr/bin/sngrep --no-interface --quiet --limit 1 --rotate --output /var/log/sip/sip.pcap.gz 
Restart=on-failure
User=sngrep
Group=sngrep

[Install]
WantedBy=multi-user.target
chschnell commented 10 months ago

Yes, I was planning to suggest a systemd service file myself. I think both are usefull, one for ad-hoc background recordings and one for permanent background recordings (systemd).

I updated my earlier comment, the shell command was incorrect.

Here's a logrotate config file adapted to your systemd service (I tested it overnight):

# Create logrotate control file for sngrep
# - rotate daily, keep up to 5 pcap files
# - use signal USR1 to notify sngrep about pcap file rotation
#
cat <<EOF | sudo tee /etc/logrotate.d/sngrep > /dev/null
/var/log/sip/sip.pcap.gz {
    daily
    rotate 5
    postrotate
        /usr/bin/killall -USR1 sngrep
    endscript
}
EOF

Maybe we can add a page to the Wiki with "Tipps for running sngrep in the background" and put these snippets there.

Does sngrep compress with gzip when the dump file ends in .gz?

chschnell commented 10 months ago

Hi!

The SIGHUP code comes from a PR #401

The SIGCONT implementation for SSH terminal disconnection was based on experimentation (displaying what signal was being received), but maybe the previous PR was handling SIGHUP and masking it from the logs.

Most probably the SIGHUP rotation feature is only actually being used by @g-v-egidy and doesn't mind to change the signal with USR1 to avoid extra setting in configuration. If he agrees, I don't mind breaking some backwards compatibility with this feature.

Thanks for the contribution!

Regards

Thank you for your feedback!

I think we all agree that it's ok to break backwards compatibility in this case.

I tested my changes for three days now, it all works as expected.

Kaian commented 10 months ago

Maybe we can add a page to the Wiki with "Tipps for running sngrep in the background" and put these snippets there.

I have always discouraged the usage of sngrep in background. It's a terminal tool and may have memory leaks that affect during long runs. It's designed to be used as SIP flow display tool rather than a background capturing tool. We use captagent, heplify or tcpdump for such tasks, their performance is awesome and it will never miss a packet from the wire because it's busy parsing a SIP message.

Does sngrep compress with gzip when the dump file ends in .gz?

IIRC, it reads gziped data, but I don't think it stores gziped dumps.

Regards

g-v-egidy commented 10 months ago

We use captagent, heplify or tcpdump for such tasks, their performance is awesome and it will never miss a packet from the wire because it's busy parsing a SIP message.

I am not aware of any other tool that can ingest HEP and write PCAP out of it. I do not want to run a huge and complicated homer setup, I just need compressed pcap files.

Does sngrep compress with gzip when the dump file ends in .gz?

IIRC, it reads gziped data, but I don't think it stores gziped dumps.

I implemented reading and writing of gzip compressed pcaps in #407 . Detection if gzip compression should be used is done based on the ".gz" suffix of the filename given in "--output".

chschnell commented 10 months ago

I have always discouraged the usage of sngrep in background.

I wasn't aware of this, I assumed the opposite, so you can disregard what I wrote earlier.

My whole point is about using sngrep interactively.