jart / cosmopolitan

build-once run-anywhere c library
ISC License
18.32k stars 628 forks source link

Potential false positives in `CheckForFileLeaks()` depending on caller fds #915

Open jtru opened 1 year ago

jtru commented 1 year ago

I've been testing the latest 3.0 release on my Arch Linux/KDE Plasma 5 desktop system, and stumbled over a curious test failure in test/libc/thread/mu_semaphore_sem_test.com, which looks like this:

$ make
[...]
mu_semaphore_sem_test.com: FILE DESCRIPTOR LEAKS: 24
total 0
dr-x------ 2 colo colo  4 Oct 24 19:20 .
dr-xr-xr-x 9 colo colo  0 Oct 24 19:20 ..
lr-x------ 1 colo colo 64 Oct 24 19:20 0 -> /dev/null
l-wx------ 1 colo colo 64 Oct 24 19:20 1 -> pipe:[14948116]
l-wx------ 1 colo colo 64 Oct 24 19:20 2 -> pipe:[14948116]
lr-x------ 1 colo colo 64 Oct 24 19:20 24 -> anon_inode:inotify
test/libc/thread/mu_semaphore_sem_test.c:57: test failed
        EXITS(0)
          want WEXITSTATUS(0)
           got WEXITSTATUS(1)
        ryzealot

`make MODE= -j12 o//test/libc/thread/mu_semaphore_sem_test.com.runs` exited with 1:
o//test/libc/thread/mu_semaphore_sem_test.com
consumed 29,488µs wall time
ballooned to 2,688kb in size
needed 2,680us cpu (0% kernel)
caused 244 page faults (100% memcpy)
24 context switches (87% consensual)

make: *** [build/rules.mk:129: o//test/libc/thread/mu_semaphore_sem_test.com.runs] Error 1

Unable to reproduce the issue in other environments I have access to, I dug a little deeper, and found this oddity that makes my desktop system differ from the others (which are mostly Debian-based headless servers):

$ ls -l /proc/$$/fd/24
lr-x------ 1 colo colo 64 24. Okt 19:11 /proc/1643169/fd/24 -> anon_inode:inotify

Closing that fd in the shell invoking the test fixes the observed problem:

$ o//test/libc/thread/mu_semaphore_sem_test.com
mu_semaphore_sem_test.com: FILE DESCRIPTOR LEAKS: 24
total 0
dr-x------ 2 colo colo  4 24. Okt 19:29 .
dr-xr-xr-x 9 colo colo  0 24. Okt 19:29 ..
lrwx------ 1 colo colo 64 24. Okt 19:29 0 -> /dev/pts/2
lrwx------ 1 colo colo 64 24. Okt 19:29 1 -> /dev/pts/2
lrwx------ 1 colo colo 64 24. Okt 19:29 2 -> /dev/pts/2
lr-x------ 1 colo colo 64 24. Okt 19:29 24 -> anon_inode:inotify
test/libc/thread/mu_semaphore_sem_test.c:57: test failed
        EXITS(0)
          want WEXITSTATUS(0)
           got WEXITSTATUS(1)
        ryzealot
$ exec 24>&-
$ ls -l /proc/$$/fd/
total 0
lrwx------ 1 colo colo 64 24. Okt 19:11 0 -> /dev/pts/2
lrwx------ 1 colo colo 64 24. Okt 19:11 1 -> /dev/pts/2
lrwx------ 1 colo colo 64 24. Okt 19:11 2 -> /dev/pts/2
lrwx------ 1 colo colo 64 24. Okt 19:11 255 -> /dev/pts/2
$ o//test/libc/thread/mu_semaphore_sem_test.com
$ echo $?
0

So, something in my env makes all the interactive shells I run as children of my graphical login session inherit an inotify-related fd for reasons I do not yet understand. For posterity, the process tree at this particular moment in time looked like this:

$ pstree -s -p $$
systemd(1)───systemd(614)───plasmashell(751)───konsole(1643154)───bash(1643169)───pstree(1643836)

... and it seems like plasmashell is responsible for the culprit (fd 24) that leaks into its descendants:

$ sudo ls -l /proc/614/fd/24
lrwx------ 1 colo colo 64 24. Okt 19:13 /proc/614/fd/24 -> 'socket:[16064]'
$ sudo ls -l /proc/751/fd/24
lr-x------ 1 colo colo 64 24. Okt 19:12 /proc/751/fd/24 -> anon_inode:inotify

Maybe this is intentional by KDE Plasma, maybe whatever causes this missed to apply FD_CLOEXEC to the fd after having created it, or maybe something else is the reason for the observed behavior... I am not sure yet, but will try to look into it.

Whatever the real reason, I guess there could be other environments where this is a problem, too, and cannot be fixed as trivially as in my case. Having run into the problem this way, I feel unsure whether the implemented testing logic is what really is required/wanted...

Looking at test/libc/thread/mu_semaphore_sem_test.c, that calls CheckForFileLeaks(), defined in libc/stdio/fleaks.c, where all open fds less than a magic constant called MIN_CLANDESTINE_FD (afaiui arbitrarily defined as 100) and greater than 2 (STDERR) will cause the test to blow up.

I think that might be a bit overzealous, and the caller maybe should allow for a list of known-and-OK-to-be-open fds (which the failing test should gather before it starts it pthread-related mutex frenzy) to be supplied to prevent the explosive exit(1).

What do you think - does this make sense, or is my current env, with that lone and curently unexplained open fd, considered too broken to be supported? :)

jart commented 1 year ago

Whatever you're using to run make is probably leaking file descriptors into the process. Try changing this to close file descriptors up to 200.

https://github.com/jart/cosmopolitan/blob/062b2d776e8b18aea444fe81e186c1c39a465806/libc/testlib/testmain.c#L98-L100

jtru commented 1 year ago

Ah! I was not aware of that piece of code that should make open fds meet expectations in the first place :)

I can confirm making that loop go up to fd < 200 (or any other number that makes it also traverse and also close() fd 24) fixes the failing test. I guess cosmopolitan/libc/testlib/testmain.c should be adapted to close at least the range of fds that CheckForFileLeaks() patrols to check for leakage.

And thanks very much for the astonishingly quick reply, btw! ;)