sudo-project / sudo

Utility to execute a command as another user
https://www.sudo.ws
Other
1.16k stars 208 forks source link

Pass SUDO_EVLOOP_ONCE to del_io_events to prevent sudo hanging at exit on non-BSD systems. #247

Closed Dzejrou closed 1 year ago

Dzejrou commented 1 year ago

In 994f86b40 the flag SUDO_EVLOOP_ONCE was passed to del_io_events when called from from pty_close to avoid potential hangs on non-BSD systems. Later, in 8f694cc22, the flag argument was removed from del_io_events in favor of a boolean argument determining whether the function should block (which was the other flag being passed to it besides SUDO_EVLOOP_ONCE). This change lead to the original hang fix to be lost and the same - or similar - hang is now reproducible in sudo.

This commit reintroduces the flag argument to del_io_events with both the SUDO_EVLOOP_ONCE and SUDO_EVLOOP_NONBLOCK values.

This bug was reported to me as occurring when sudo was (repeatedly) invoked from PuTTY terminal connected to a linux-based system (SLES with sudo-1.9.5p2, but I also reproduced it on openSUSE Tumbleweed with sudo-1.9.12p2) when it receives input (letters + return). For the reporter (windows) it took 2-3 minutes to reproduce the issue, but I managed to do that on a linux-based system under wine after 20+ minutes of non-stop typing, so I assume this is some kind of a race related to buffer flushing.

After debugging this issue I found out that restoring the logic used in 994f86b40 made the issue go away both for me (longer reproduction time, so less reliable as I've seen reproduction times over 1 hour [*]) and the reporter (reliably reproduced in 2-3 minutes, so their feedback made me think this was the same issue as the one fixed by 994f86b40).

Some extra data in case you'd want to choose a different approach and these might be of use to you:

[*] Report from when I finally managed to reproduce on Tumbleweed under wine (just to demonstrate the timings):

germ245:~ # ./watch.sh Starting watching at: 22:14:36. 22:28:22: root 27017 0.0 0.0 0 0 pts/2 Z+ 22:28 0:00 [sudo] \ 22:43:30: root 7969 0.0 0.0 0 0 pts/2 Z+ 22:43 0:00 [sudo] \ 22:58:45: root 21380 0.0 0.0 0 0 pts/2 Z+ 22:58 0:00 [sudo] \ 23:16:54: root 25169 0.0 0.0 0 0 pts/2 Z+ 23:16 0:00 [sudo] \ 23:18:24: root 3956 0.0 0.0 0 0 pts/2 Z+ 23:18 0:00 [sudo] \ 00:19:33: root 28592 0.0 0.0 0 0 pts/2 Z+ 00:19 0:00 [sudo] \

The stacktrace of a hanged sudo process:

0 0x00007fd54f202299 in ppoll () from /lib64/libc.so.6

1 0x00007fd54ff68128 in ppoll (ss=0x0, timeout=, nfds=, fds=) at /usr/include/bits/poll2.h:77

2 sudo_ev_poll (timo=, nfds=, fds=) at ./event_poll.c:158

3 sudo_ev_scan_impl (base=base@entry=0x563d68b6b6c0, flags=flags@entry=0) at ./event_poll.c:194

4 0x00007fd54ff5dd3d in sudo_ev_loop_v1 (base=base@entry=0x563d68b6b6c0, flags=flags@entry=0) at ./event.c:656

5 0x00007fd54ff5dfb7 in sudo_ev_dispatch_v1 (base=base@entry=0x563d68b6b6c0) at ./event.c:622

6 0x0000563d688304e9 in del_io_events (nonblocking=nonblocking@entry=false) at ./exec_pty.c:1795

7 0x0000563d68832d47 in pty_finish (cstat=0x7ffe2492c830) at ./exec_pty.c:867

8 exec_pty (details=details@entry=0x563d6884eba0 , cstat=cstat@entry=0x7ffe2492c830) at ./exec_pty.c:1668

My reproducer (requires X window system, wine, putty.exe and xdotool):


#!/bin/sh

PUTTY_SESSION=germ245
HOST=germ245.REDACTED
MIN=10
MAX=500
LOOP_TIME=60
COUNTER=0
DO_SLEEP=0

while true
do
    # Spawn the guest.
    wine downloads/putty.exe -load $PUTTY_SESSION -l root -pw REDACTED &
    sleep 3
    PID=`pgrep putty`
    WINDOW=`xdotool search --name "$HOST - PuTTY" | sort -r | head -n 1`

    # Start the loop on the guest.
    xdotool windowactivate $WINDOW
    xdotool type --window $WINDOW "while true; do sudo echo -n test; done"
    xdotool key --window $WINDOW Return
    sleep 1

    printf "Started PuTTY with PID=%d and WINDOW=%d.\n" $PID $WINDOW
    printf "Starting loop %d at %s.\n" $COUNTER `date +%H:%M:%S`

    # Send key input to the PuTTY window at randomized intervals or instantly
        # depending on the value of $DO_SLEEP (1 results in randomized sleeps between
        # sending input controlled by the values of $MIN and $MAX).
    SECONDS=0
    while (( $SECONDS < $LOOP_TIME))
    do
        xdotool key --window $WINDOW o
        xdotool key --window $WINDOW Return

        # In case we want to tweak the rate at which we send
        # input to the PuTTY window.
        if [ $DO_SLEEP -eq 1 ]
        then
            sleep 0.$[ ( $RANDOM % $MAX + $MIN) ]s
        fi
    done

    printf "Loop %d ended at %s.\n" $COUNTER `date +%H:%M:%S`

    COUNTER=$((COUNTER+1))

    # Without recreation of the PuTTY window, it would freeze
    # until the process is terminated at times, so we recreate
    # it every $LOOP_TIME seconds.
    printf "Killing putty process PID=%d.\n" $PID
    kill -9 $PID
done

What it does is that it repeatedly opens PuTTY terminals (the terminal would freeze after several minutes of keyboard input due to how fast it was sent to it, depending on the delay between simulated key presses - ranging from 2 minutes to 20 minutes approximately with different values I tried -- I assume this is ssh/X related) and executes sudo in a loop while sending simulated key presses of a character key and return. This was then watched on the target guest for defunct sudo instances.

millert commented 1 year ago

Thank you for your excellent and detailed analysis. I was able to easily reproduce the problem using putty from a Windows laptop to a Linux system. After spending some time in the debugger I found that the root cause of the problem is that the write callback is re-enabling the read event for /dev/tty. Even though del_io_events() avoids enabling read events itself, if there is buffered data destined for /dev/tty, once that data is written the read event would be enabled and the event loop never completed.

I believe the best fix is to just avoid enabling the read events in the write callback for /dev/tty if the command is no longer running. That will solve the problem at hand while and still completely flush the buffers, even if that takes more than one write. It may also prevent sudo from reading tty input when there is no reason for it to do so. I am currently testing a diff to do this and so far I am unable to reproduce the hang, which previously I could trigger after less than a minute.

millert commented 1 year ago

Closing in favor of 19a6606, but thank you for all your work. It made a big difference.

Dzejrou commented 1 year ago

Closing in favor of 19a6606, but thank you for all your work. It made a big difference.

Happy to help, thanks for the fix.