martinpitt / fatrace

report system wide file access events
GNU General Public License v3.0
167 stars 12 forks source link

open_by_handle_at: Too many levels of symbolic links #2

Closed xse closed 3 years ago

xse commented 3 years ago

Hi, i can reliably trigger this error just by launching systemctl list-unit-files on the following system:

model name      : ARMv6-compatible processor rev 7 (v6l)
Hardware        : BCM2835
Model           : Raspberry Pi Model B Plus Rev 1.2
xse@buttplug ~ % uname -a
Linux buttplug 5.10.14-1-ARCH #1 SMP Mon Feb 8 16:05:41 UTC 2021 armv6l GNU/Linux

I don't really know how to figure out what's going on. Here is an example error output:

systemd(1): O /
systemd(1): O /etc
systemd(1): CO /
systemd(1): CO /etc/systemd
systemd(1): CO /etc
systemd(1): CO /etc/systemd/system
systemd(1): CO /usr
systemd(1): CO /usr/local
systemd(1): CO /usr/local/lib
systemd(1): CO /usr/lib
systemd(1): CO /usr/lib/systemd
systemd(1): CO /usr/lib/systemd/system
systemd(1): CO /
systemd(1): O /etc/systemd/system
systemd(1): RCO /etc/systemd/system
systemd(1): CO /
systemd(1): CO /etc
systemd(1): CO /etc/systemd
fatrace: open_by_handle_at: Too many levels of symbolic links

What seem to happens is that systemd opens random files with openat and get the ELOOP from strace -e openat -f -p 1 -o openat.log:

root@buttplug ~ # cat openat.log | grep ELOOP
1     openat(AT_FDCWD, "/etc/systemd/system/dbus-org.freedesktop.network1.service", O_RDONLY|O_NOCTTY|O_LARGEFILE|O_NOFOLLOW|O_CLOEXEC) = -1 ELOOP (Too many levels of symbolic links)
1     openat(AT_FDCWD, "/etc/systemd/system/dbus-org.freedesktop.timesync1.service", O_RDONLY|O_NOCTTY|O_LARGEFILE|O_NOFOLLOW|O_CLOEXEC) = -1 ELOOP (Too many levels of symbolic links)
1     openat(AT_FDCWD, "/etc/systemd/system/dbus-org.freedesktop.resolve1.service", O_RDONLY|O_NOCTTY|O_LARGEFILE|O_NOFOLLOW|O_CLOEXEC) = -1 ELOOP (Too many levels of symbolic links)
1     openat(AT_FDCWD, "/usr/lib/systemd/system/autovt@.service", O_RDONLY|O_NOCTTY|O_LARGEFILE|O_NOFOLLOW|O_CLOEXEC) = -1 ELOOP (Too many levels of symbolic links)

Fanotify then gives those file handles to fatrace that tries to open em with open_by_handle_at and can't do so. I assume that's what happens tbh i don't know how to link the handles i get from an strace of fatrace to files. Another strange thing is that systemctl list-unit-files fails with "Connection timed out" while being traced, tho i assume that's unrelated and due to smth on systemd's side. I wonder if anyone can replicate on x86 i haven't been able to try, however I've had confirmation it fails on other arm machines.

I'll try to play with audit/bpf and will update this post if i figure out a way to see which file causes that issue. Have a good day o/

martinpitt commented 3 years ago

I tried to reproduce this with a naïve symlink loop:

--- tests/fatrace-currentmount
+++ tests/fatrace-currentmount
@@ -25,6 +25,12 @@ mv "${TEST_FILE}.2" dest/"${TEST_FILE}.2"
 rm dest/"${TEST_FILE}.2"
 rmdir dest

+echo "robustness against ELOOP"
+ln -s link1 link2
+ln -s link2 link1
+ls -lR link1 || true
+rm link1 link2
+

but that works fine. I suppose for system it's not really a symlink loop, but opening with O_NOFOLLOW, and these files ought to be links. Unfortunately I think O_NOFOLLOW can't be done in shell.

martinpitt commented 3 years ago

... but Python to the rescue! I added a test which should model this situation, and it works.

So that's not a reproducer yet, and something more intricate is going on. That said, I think fatrace does not need to be that strict, and it'd be safe to ignore such errors.

martinpitt commented 3 years ago

Nevertheless, I made this non-fatal now. Do you have a chance to test current master and letting me know if it's still broken? (I'll also do a new upstream release soon).

xse commented 3 years ago

Do you have a chance to test current master and letting me know if it's still broken? (I'll also do a new upstream release soon).

Yup, it now just warns about it!

2021-03-13 15_30_49-~

Thanks a lot!