trifectatechfoundation / sudo-rs

A memory safe implementation of sudo and su.
Other
2.87k stars 76 forks source link

Sudo high cpu usage after terminal is closed #841

Closed blazp7 closed 2 months ago

blazp7 commented 4 months ago

Describe the bug Frequently the sudo process is left running at 100% on one cpu core, after the terminal is already closed. When i hear the cpu fans i know there is atleast one sudo that i need to kill manually.

To Reproduce Steps to reproduce the behavior:

Expected behavior sudo process doesnt torture my cpu.

Environment (please complete the following information):

Additional context I use Hyprland window manager, maybe it has something to do with how it closes apps? Or maybe nix shell causes it.

pvdrz commented 4 months ago

@blazp7 does this happen with OG sudo as well?

Edit: I can't reproduce this while downloading a large file and killing alacritty on gnome.

squell commented 4 months ago

I remember we fixed this related issue during the Berlin Meeting: https://github.com/memorysafety/sudo-rs/issues/263. We could also have a look at the relevant code where the potential loop might be.

Of course, @pvdrz, if we fix this we are getting really close to your favorite scenario

squell commented 4 months ago

Note: the example you cite is sudo su; is that our su implementation or the "regular" version? Does it also occur with say something like sudo sleep 600 or sudo yes > /dev/null?

pvdrz commented 4 months ago

@squell the only place where something similar could happen would be here: https://github.com/memorysafety/sudo-rs/blob/9e7a4c4fa48ec6e411e5cb3b2ae0a4258227a856/src/exec/event.rs#L222-L243

But that would mean that there aren't any file descriptors being polled or that the polling is failing consistently, which I find very unlikely as all the signal handling is done via sockets that are polled there.

blazp7 commented 4 months ago

It seems that just exiting a terminal where sudo su is active causes this issue. I have tried multiple terminal emulators, same issue. I have tried to exit the terminal app with killall instead of hotkey, same issue. So it probably isnt Hyprland or Alacritty at fault. If you guys cant reproduce this problem i would say it has something to do with how NixOS packages the binary and wraps it in a module. Or just how NixOS is.

recording.webm

pvdrz commented 4 months ago

I don't seem to be able to reproduce it on my system:

Screenshot from 2024-04-24 08-42-52

blazp7 commented 4 months ago

Note: the example you cite is sudo su; is that our su implementation or the "regular" version? Does it also occur with say something like sudo sleep 600 or sudo yes > /dev/null?

Only sudo-rs, not the regular version. And only sudo su then killing the terminal, i tried the other commands you posted. Edit: nixos uses this su implementation from here

Might or might not be relevant, dont know if this is the default behaviour but someone from nixos matrix chat said:

squell commented 4 months ago

(Yes, sudo -s or sudo -i is generally what you should do instead of sudo su, but that shouldn't cause this issue.)

Does it happen with sudo sudo -s as well?

blazp7 commented 4 months ago

Does it happen with sudo sudo -s as well?

No. After i close the terminal that has an active sudo sudo -s or sudo bash every sudo process is properly closed as expected. This points to su being the culprit? However when i do su myuser there are no issues.

squell commented 4 months ago

I'm suspecting an interaction between ~util-linux~ nixos's su and sudo-rs based on your answers; even though su seems to trigger it, I'd say the problem lies with sudo-rs (since the child process shouldn't be able to influence it in this way).

blazp7 commented 4 months ago

Alright, well i learned that i should use sudo -s and sudo -i, but if youre going to troubleshoot this interaction just remember that nixos uses this su implementation -> https://github.com/shadow-maint/shadow/blob/master/src/su.c

squell commented 4 months ago

That's very useful info (I didn't know shadow-utils also packaged a su implementation). @pvdrz

AppleSheeple commented 3 months ago

I have commands showing the same symptoms. They look like:

sudo -- su root -c ip netns exec SOME_NS su someuser -c "'mate-terminal' '--disable-factory'"

Self-built sudo-rs 0.2.2 on Arch. su from util-linux.

strace shows these two lines repeating:

poll([{fd=4, events=POLLIN}, {fd=5, events=POLLIN}, {fd=7, events=POLLIN}, {fd=6, events=POLLIN}], 4, -1) = 1 ([{fd=4, revents=POLLIN|POLLERR|POLLHUP}])
read(4, "", 8192)                       = 0

Never seen this behavior before installing sudo-rs. But It's worth noting that I updated many system packages at the same time. So other variables could be relevant, but I highly doubt it.

Is that the same issue?


I know Rust if you think fiddling with/debuging a specific part of the code would help.

pvdrz commented 3 months ago

@AppleSheeple thanks for the report!

It seems one of the sockets we're using is being reported as ready constantly (file descriptor 4 in your strace output) when being polled.

I have a hunch this has to do with a socket on the other "side" being closed (we always use socketpair here) and polling socket number 4 returns immediately because it is actually returning an error.

We may be handling this error improperly. I'll try to take a look at this next week.

AppleSheeple commented 3 months ago

I had some time for a proper initial investigation (didn't get to the code yet).

What's happening is that I run this stuff from a script in a terminal. But I background and detach this run.

So:

./script_with_a_sudo_call_inside.sh &!

fd 4 is not actually a socket. It's actually /dev/tty. Now, if I don't close the terminal from where this script is running, the issue is not triggered. But once I do (/dev/pts/<n> associated with stdin/stdout/stderr is deleted), the issue is triggered. So, in dev logs, you get endless lines of:

src/exec/use_pty/parent.rs:353:24: event Tty(Readable) is ready
AppleSheeple commented 3 months ago

I gave the code a look, and confirmed that polling here does NOT return an error: https://github.com/memorysafety/sudo-rs/blob/90e2385b866dde529837da9e059ec0f1e9cd83c6/src/exec/event.rs#L182

So, even if that error wasn't ignored, the issue would still be triggered.

I don't think I will have the time to chase this further.

AppleSheeple commented 3 months ago

Alright. Had some time to look at this again.

This change fixes the issue for me:

diff --git a/src/exec/use_pty/pipe/mod.rs b/src/exec/use_pty/pipe/mod.rs
index 7f02f06..b6046c7 100644
--- a/src/exec/use_pty/pipe/mod.rs
+++ b/src/exec/use_pty/pipe/mod.rs
@@ -7,6 +7,7 @@ use std::{
 };

 use crate::exec::event::{EventHandle, EventRegistry, PollEvent, Process};
+use crate::log::dev_warn;

 use self::ring_buffer::RingBuffer;

@@ -58,6 +59,19 @@ impl<L: Read + Write + AsRawFd, R: Read + Write + AsRawFd> Pipe<L, R> {
         &self.right
     }

+    pub(super) fn tty_gone<T: Process>(&mut self, registry: &mut EventRegistry<T>) -> bool {
+        let gone = unsafe {
+            // Check if tty which existed is now gone.
+            // NOTE: errno set is EIO which is not a documented possibility.
+            libc::tcgetsid(self.left.as_raw_fd()) == -1
+        };
+        if gone {
+            dev_warn!("tty gone (closed/detached), ignoring future events");
+            self.ignore_events(registry);
+        }
+        gone
+    }
+
     /// Stop the poll events of this pipe.
     pub(super) fn ignore_events<T: Process>(&mut self, registry: &mut EventRegistry<T>) {
         self.buffer_lr.read_handle.ignore(registry);
@@ -80,6 +94,9 @@ impl<L: Read + Write + AsRawFd, R: Read + Write + AsRawFd> Pipe<L, R> {
         poll_event: PollEvent,
         registry: &mut EventRegistry<T>,
     ) -> io::Result<()> {
+        if self.tty_gone(registry) {
+            return Ok(());
+        }
         match poll_event {
             PollEvent::Readable => self.buffer_lr.read(&mut self.left, registry),
             PollEvent::Writable => self.buffer_rl.write(&mut self.left, registry),
AppleSheeple commented 3 months ago

Just ran into this on another device where I forgot to install a patched version. Quite annoying.