rfjakob / earlyoom

earlyoom - Early OOM Daemon for Linux
MIT License
2.91k stars 153 forks source link

[FR] Unix mail on kill #237

Closed protist closed 3 years ago

protist commented 3 years ago

Hi, I was wondering if it were possible to send (unix) mail if processes were killed and/or there were other issues or errors.

I understand that there can be GUI notifications, but having an option for mail would be better for (a) servers and (b) persistence if the GUI notification were missed. Thank you!

ceremcem commented 3 years ago

Since sending mail is a kind of task that has lots of possible ways to perform and requires many many options, I don't think this tool should support such a feature.

I suggest you implement this feature by yourself. Sending mail is an easy task: https://github.com/ceremcem/monitor-btrfs-disk/blob/9a1479a1c4c41a112ef7115fd684940890b94ee0/send-email.sh and after #243, there should be no reason to create a simple polling script that sends the mails.

protist commented 3 years ago

@ceremcem I'm not sure if we are talking about the same kind of mail. From your link, I think you are talking about smtp/networked mail. I'm talking about unix mail. It doesn't require any options really, it's as simple as echo body | mail -s "subject" root.

I'm running earlyoom as systemd service. Do you mean something like creating a cron/systemd job to poll journalctl/systemctl output? If so, I think this is slightly problematic for a few reasons.

1) Users would have to poll regularly, which is inefficient compared to earlyoom actively triggering the mail. 2) Users would have to somehow filter out old Killing process messages. Not insurmountable, but much easily implemented if earlyoom were to send mail itself with (presumably) a few lines of code.

EDIT: and numerous other tools, e.g. cron, mail on errors.

ceremcem commented 3 years ago

Ah, yes, we are talking about different aspects. Please ignore my previous post.

rfjakob commented 3 years ago

This is now again possible using the -N option (see https://github.com/rfjakob/earlyoom/pull/256). Thanks

rfjakob commented 3 years ago

PS: https://github.com/rfjakob/earlyoom/blob/master/MANPAGE.md#-n-pathtoscript

protist commented 3 years ago

That's great! Thank you!

The formatting looks great, but perhaps a little fiddly, so I wouldn't mind testing a bit. Is there a way to manually trigger a test notification? Otherwise, I guess I could create a test process, but that's potentially a bit dangerous.

protist commented 3 years ago

Hmm… I definitely need to troubleshoot a bit. I'm running KDE Plasma, which is great because the memory leak in plasmashell triggers earlyoom every few days 😭

I tested the following script

printf '%s\n%s\n%s\n%s\n\n' 'earlyoom triggered' "PID: $EARLYOOM_PID" "NAME: $EARLYOOM_NAME" "UID: $EARLYOOM_UID"
journalctl -n -u earlyoom
(printf '%s\n%s\n%s\n%s\n\n' 'earlyoom triggered' "PID: $EARLYOOM_PID" "NAME: $EARLYOOM_NAME" "UID: $EARLYOOM_UID" ; journalctl -n -u earlyoom) | mail -s "earlyoom" root

The first command printed out to journalctl fine, so it looks like the -N option is working well in a general sense.

The purpose of the second line was because the variables reported by -N didn't include VmRSS, which I thought might be useful. This is normally exposed in journalctl, so I thought I could just tail it instead. However, this seemed to fail due to permissions as below.

The third line was to actually mail the details. This also failed as per below.

$ sudo journalctl -fu earlyoom
...
Aug 12 14:17:40 hostname earlyoom[616]: mem avail:  3166 of 32043 MiB ( 9.88%), swap free:    0 of    0 MiB ( 0.00%)
Aug 12 14:17:40 hostname earlyoom[616]: low memory! at or below SIGTERM limits: mem 10.00%, swap 10.00%
Aug 12 14:17:40 hostname earlyoom[616]: sending SIGTERM to process 1025 uid 1000 "plasmashell": badness 1270, VmRSS 14570 MiB
Aug 12 14:17:40 hostname earlyoom[616]: process exited after 0.0 seconds
Aug 12 14:17:40 hostname earlyoom[440889]: earlyoom triggered
Aug 12 14:17:40 hostname earlyoom[440889]: PID: 1025
Aug 12 14:17:40 hostname earlyoom[440889]: NAME: plasmashell
Aug 12 14:17:40 hostname earlyoom[440889]: UID: 1000
Aug 12 14:17:40 hostname journalctl[440890]: Hint: You are currently not seeing messages from other users and the system.
Aug 12 14:17:40 hostname journalctl[440890]:       Users in groups 'adm', 'systemd-journal', 'wheel' can see all messages.
Aug 12 14:17:40 hostname journalctl[440890]:       Pass -q to turn off this notice.
Aug 12 14:17:40 hostname journalctl[440890]: No journal files were opened due to insufficient permissions.
Aug 12 14:17:40 hostname journalctl[440893]: Hint: You are currently not seeing messages from other users and the system.
Aug 12 14:17:40 hostname journalctl[440893]:       Users in groups 'adm', 'systemd-journal', 'wheel' can see all messages.
Aug 12 14:17:40 hostname journalctl[440893]:       Pass -q to turn off this notice.
Aug 12 14:17:40 hostname journalctl[440893]: No journal files were opened due to insufficient permissions.
Aug 12 14:17:40 hostname earlyoom[440894]: 2021-08-12 14:17:40 Warning: purging the environment.
Aug 12 14:17:40 hostname earlyoom[440894]:  Suggested action: use keep_environment.
Aug 12 14:17:40 hostname earlyoom[440892]: mail: Cannot save to $DEAD: Read-only filesystem
Aug 12 14:17:40 hostname earlyoom[440892]: mail: ... message not sent
Aug 12 14:17:40 hostname exim[440894]: 2021-08-12 14:17:40 1mE2A0-001qhC-AK failed to write to main log: length=127 result=-1 errno=9 (Bad file descriptor)
Aug 12 14:17:40 hostname exim[440894]: write failed on panic log: length=117 result=-1 errno=9 (Bad file descriptor)
rfjakob commented 3 years ago

Oh, looks like the systemd sandbox blocks the mail

rfjakob commented 3 years ago

Is there a way to manually trigger a test notification? Otherwise, I guess I could create a test process, but that's potentially a bit dangerous.

Best way I see is something like this:

./earlyoom --dryrun -m 99 -s 100 -N $PWD/contrib/earlyoom-notify-example.sh

Note: better hit Ctrl-C quickly, it will send 10 notifications per second ;)

protist commented 3 years ago

Best way I see is something like this: [...] --dryrun

Thanks @rfjakob. That works well!

Oh, looks like the systemd sandbox blocks the mail

Hmm… so I spent an hour or so troubleshooting this, and I can get it to send mail if I modify the systemd service as follows:

  1. DynamicUser=true -> Group=systemd-journal
  2. Comment out ProtectSystem=strict
  3. Comment out ProtectHome=true

I needed all three changes to get mail to work. The other advantage is that as per the script above, the service can now access journalctl. Presumably these changes aren't great from a security perspective, but at least I can tell specifically what is blocking mail. Would you have any suggestions as to how to get it working otherwise?

protist commented 3 years ago

Sorry, I thought I'd leave this for a while, but from my perspective this isn't resolved, since there is no clean way of sending mail for me. If this is a "won't fix", that's perfectly fine, but if possible I'd love some feedback, or even perhaps reopen this issue? Thank you in advance.

ceremcem commented 3 years ago

@protist Will #254 satisfy the same achievement, or would you still need this feature?

protist commented 3 years ago

@ceremcem Sorry, that issue was already linked above (via #256), but as above when run as a systemd job the sandbox appears to block the mail. I tried some troubleshooting above, but this presumably has some bad security implications. I'm not sure if there might be some secure programmatic way to sidestep this sandbox?

ceremcem commented 3 years ago

Ah, sorry, I must have missed the security part.

protist commented 3 years ago

No worries @ceremcem