pwmt / zathura

Document viewer
https://pwmt.org/projects/zathura
zlib License
1.88k stars 129 forks source link

strict sandbox mode blocks required syscalls #368

Closed sebastinas closed 2 months ago

sebastinas commented 1 year ago

On GitLab by @lambdcalculus on Dec 31, 2022, 21:13


Strict sandbox mode disallows the syscalls gettimeofday and clock_gettime, which seem to be necessary during startup (doing :set sandbox strict after starting works fine), at least on my end. This also causes the AUR package zathura-git to fail installation as ninja test fails at the sandbox unit test.

Here is the relevant ninja test log bit:

$ ninja test
[0/1] Running all tests.
1/6 validate-desktop        OK                0.02s
2/6 utils                   OK                0.02s
3/6 document                OK                0.03s
4/6 types                   OK                0.02s
5/6 sandbox                 FAIL              0.21s   exit status 1
>>> MALLOC_PERTURB_=176 /home/lambda/src/repos/zathura/build/tests/test_sandbox
―――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――― ✀  ―――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――
stdout:
Running suite(s): Sandbox
0%: Checks: 1, Failures: 0, Errors: 1
../tests/test_sandbox.c:11:E:basic:test_create:0: (after this point) Received signal 31 (Bad system call)
[...]

And relevant strace logs (obtained by using set sandbox strict in zathurarc. the second one was obtained after unblocking the first syscall):

$ strace ./zathura -c .
[...]
seccomp(SECCOMP_SET_MODE_FILTER, 0, 0x55b1a37840f0) = 0
futex(0x7f16b014b0b8, FUTEX_WAKE_PRIVATE, 2147483647) = 0
futex(0x7f16b014b0b8, FUTEX_WAKE_PRIVATE, 2147483647) = 0
futex(0x7f16b014b0b8, FUTEX_WAKE_PRIVATE, 2147483647) = 0
gettimeofday(0x7ffc57ab2020, NULL)      = 96
+++ killed by SIGSYS +++
fish: Job 1, 'strace ./zathura -c .' terminated by signal SIGSYS (Bad system call)
$ strace ./zathura -c .
[...]
futex(0x7f71259b30b8, FUTEX_WAKE_PRIVATE, 2147483647) = 0
futex(0x55588d69f340, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x7f71256f9ba0, FUTEX_WAIT_PRIVATE, 2, NULL) = -1 EAGAIN (Resource temporarily unavailable)
futex(0x7f71256f9ba0, FUTEX_WAKE_PRIVATE, 1) = 0
clock_gettime(CLOCK_MONOTONIC, 0x7ffe498ba660) = 228
+++ killed by SIGSYS +++
fish: Job 1, 'strace ./zathura -c .' terminated by signal SIGSYS (Bad system call)

I'll submit a pull request allowing the two syscalls shortly.

sebastinas commented 1 year ago

On GitLab by @valoq on Jan 1, 2023, 01:14


Could you please provide some more details about your environment and what version of zathura you experience this issue with? The zathura-git package in AUR is almost 3 years old.

The current version on arch 0.5.1 as well as the most recent version 0.5.2 seem to work fine with linux 6.0 and wayland on several of my arch linux test systems. What desktop environment are you using and what version of glibc, glib2 and check do you have installed? Which zathura plugins and backends are you using and what version?

Please also provide you zathruarc configuration.

If possible, please test the current version 0.5.2 found here: (which is currently the same as the git version) https://pwmt.org/projects/zathura/download/

after extraction, please run: meson build && cd build && ninja test

edit: If possible, please also provide a test file that triggers the issue.

sebastinas commented 1 year ago

On GitLab by @valoq on Jan 1, 2023, 01:41


Here are the updated PKGBUILD files for zathura, girara and the pdf plugins with the current release versions. You will need to install girara first since the latest version is a dependency for zathura 0.5.2 pkgbuild.tar.xz

sebastinas commented 1 year ago

On GitLab by @lambdcalculus on Jan 1, 2023, 16:04


Those logs were acquired using the binary obtained by cloning and building the develop branch of this repo manually (same procedure you told me to do with the most recent release), so I used no plugins alongside it. I'm running Hyprland (on Wayland), glibc 2.36-6, glib2 2.74.4-1, and check 0.15.2-2. The zathurarc I used to obtain the strace logs only had set sandbox strict in it.

However, this issue happens with my normal installation of zathura as well, version 0.5.1-1 of the official Arch package. In this case the only plugin I have is zathura-pdf-mupdf 0.3.9-1. My zathurarc only has set selection-clipboard clipboard in it and adding set sandbox strict causes the issue to happen.

Building 0.5.2 and appending set sandbox strict to zathurarc also yields the same issue (no plugins).

tests/test_sandbox.c from the repo should trigger this issue.

Also, backtracing the clock_gettime syscall on gdb seems to indicate the syscall happens due to some GTK thing. Perhaps on your version this syscall doesn't happen?

(gdb) catch syscall clock_gettime
Catchpoint 1 (syscall 'clock_gettime' [228])
(gdb) r
Starting program: /home/lambda/src/repos/zathura-0.5.2/build/zathura 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/lib/libthread_db.so.1".

Catchpoint 1 (call to syscall clock_gettime), 0x00007ffff7fc8a1d in clock_gettime ()
(gdb) backtrace
#0  0x00007ffff7fc8a1d in clock_gettime ()
#1  0x00007ffff6dff6bd in clock_gettime () at /usr/lib/libc.so.6
#2  0x00007ffff715ca13 in  () at /usr/lib/libglib-2.0.so.0
#3  0x00007ffff715cd13 in g_thread_new () at /usr/lib/libglib-2.0.so.0
#4  0x00007ffff712c33c in  () at /usr/lib/libglib-2.0.so.0
#5  0x00007ffff72bd670 in  () at /usr/lib/libgio-2.0.so.0
#6  0x00007ffff72bd725 in g_task_get_type () at /usr/lib/libgio-2.0.so.0
#7  0x00007ffff7327163 in  () at /usr/lib/libgio-2.0.so.0
#8  0x00007ffff7327952 in g_dbus_proxy_new_for_bus () at /usr/lib/libgio-2.0.so.0
#9  0x00007ffff7570d54 in  () at /usr/lib/libgdk-3.so.0
#10 0x00007ffff7539c37 in gdk_display_manager_open_display () at /usr/lib/libgdk-3.so.0
#11 0x00007ffff77df4a1 in gtk_init_check () at /usr/lib/libgtk-3.so.0
#12 0x00007ffff77df4de in gtk_init () at /usr/lib/libgtk-3.so.0
#13 0x000055555556324b in main (argc=1, argv=0x7fffffffded8) at ../zathura/main.c:279

edit: something I just realized is that running zathura under gdb (again with the set sandbox strict option) avoids the SIGSYS. Not sure why, or whether that is relevant. i forgot to use --args -_-

sebastinas commented 1 year ago

On GitLab by @valoq on Jan 1, 2023, 19:20


Thank you for the report, I will look into it.

If you don't mind, could you provide your environment variables as well?

Also, please try to reproduce this with the following environment set:

export DBUS_SESSION_BUS_ADDRESS=""
export NO_AT_BRIDGE=1
export GTK_A11Y=none
meson build && cd build && ninja test
sebastinas commented 1 year ago

On GitLab by @lambdcalculus on Jan 1, 2023, 22:08


Environment variables here: https://pastebin.com/2Z8yfRd0.

Also, same issue occurs after setting those environment variables.

sebastinas commented 1 year ago

On GitLab by @valoq on Jan 2, 2023, 02:37


Just to make sure: The affected system is running arch linux x86_64 with the vanilla kernel and no testing packages or other modifications? Are there any additional modules, especially security modules configured on the system?

This looks like something is trying to influence or measure the process (possibly intended) and since the issue does not trigger when run inside the debugger, this is another indicator as well.

I have just rerun all tests on my VMs and I could not find any system where zathura used any of these system calls nor did I find any other gtk application using this.

My guess is that the use of these syscalls is triggered by an external application or change to a dependency, possibly some debugging tool in your $PATH

There are a few more things you could try to help find the cause of this:

If none of this helps, you could provide a list of all your installed packages and I will try to rebuild the environment to test this directly.

pacman -Qqe > packagelist
sebastinas commented 1 year ago

On GitLab by @lambdcalculus on Jan 2, 2023, 19:13


Yes, I'm running Arch Linux x86_64 with the vanilla kernel (though not the most recent), no testing packages. There are a few packages from unofficial repositories (chaotic-aur and archlinuxcn) but they are are all simply binaries from packages otherwise available in the AUR I believe. Maybe one or two of them aren't, at most.

Attached are the requested strace logs. Something I had not paid attention earlier but noticed now is that the gettimeofday and clock_gettime syscalls happen a lot, and it's only after many calls that the SIGSYS happens, not after the first. Whatever causes that, ALLOW_RULEing those syscalls fixes it. Also, it does crash on gdb, my bad. I was dumb and had forgotten to pass --args to gdb, so it wasn't passing the zathurarc directory to zathura.

Replacing the ALLOW_RULEs with ERRNO_RULEs causes an issue as well:

$ ./zathura -c .
error: could not open plugin directory: /usr/local/lib/zathura
error: Found no plugins. Please install at least one plugin.

(org.pwmt.zathura:32877): GLib-ERROR **: 17:20:52.755: GLib requires working CLOCK_MONOTONIC
fish: Job 1, './zathura -c .' terminated by signal SIGTRAP (Trace or breakpoint trap)

Attached is my package list as well. Some of them are from archlinuxcn and chaotic-aur, so you'll need those repositories as well.

log1

log2

packagelist

edit: Forgot to mention, creating a new user did not work either.

sebastinas commented 1 year ago

On GitLab by @valoq on Jan 2, 2023, 20:48


This looks indeed like something is trying to measure your zathura process. Before adding these syscalls to the whitelist, we should try to find the source of this to make sure it is a valid use case, especially since its not required by anyone else.

You could try once more with this to make sure there isn't anything on $PATH causing this:

export PATH=/usr/bin
meson build && cd build && ninja test

Another possible hint could be in the output of

sudo lsof -p ZATHURAPID > lsoflog

(needs sudo because the sandbox prevents unprivileged processes from viewing the process stats)

sebastinas commented 1 year ago

On GitLab by @lambdcalculus on Jan 3, 2023, 01:28


Changing $PATH did not change anything.

For lsof, I ran zathura without setting the sandbox option, as setting it crashes it immediately and I can't get the pid and ru lsof. Shouldn't change anything, I believe, since the syscalls still happen when zathura is run normally.

lsoflog

sebastinas commented 1 year ago

On GitLab by @valoq on Jan 3, 2023, 04:04


/home/lambda/.themes/oomox-red/gtk-3.20/gtk.gresource

This could possibly cause issues, or at least its for an older gtk version. It may be worth checking if moving this theme somewhere else temporarily will change anything, though I doubt it.

One other obvious potential issue is the large number of fontconfig cache files. Font parsing is a topic of its own and actually a common attack vector as well, which makes this a bit suspicious in my somewhat paranoid eyes.

To check this, please move the system fontconfig directory as well as the user fontconfig directory to a temporary backup

mkdir ~/fontbackup
mv /var/cache/fontconfig ~/fontbackup/systemfontconfig
mv ~/.cache/fontconfig ~/fontbackup/userfontconfig

If none of this helps, you could also try to run the official zathura arch package (0.5.1) in this container environment to see if it avoids the issue: https://github.com/valoq/bwscripts/blob/master/profiles/zathura

The root of the issue is with the glibc functions clock_gettime() and gettimeofday() which determine the actual syscalls used. Both libc functions are used on other systems as well but for some reason they trigger these syscalls only on your system.

My best guess for why this is would still be the kernel, therefore it may still help to look at kernel modules as well to make sure. Changes to libc would also explain this

uname -a >> infolist
lsmod >> infolist
pacman -Qi glibc >> infolist
sha256sum /usr/lib/libc.so.6 >> infolist
ps -ef >> infolist
sebastinas commented 1 year ago

On GitLab by @lambdcalculus on Jan 3, 2023, 19:32


Regarding the theme: I just changed to a different one and yes, no difference.

Moving the fontconfig caches just causes zathura to delay start up a bit, and then it opens a bunch of fontconfig cache files again anyway (and setting the sandbox to strict, as usual, causes a SIGSYS). I'm not particularly well-versed in security, so I don't really know how to investigate this much further, if there is some sort of attack involved with the font cache.

Running zathura on that container environment also leads to the same issue, however straceing it leads to different results. The syscalls in question do still happen a handful of time though. Here are the logs.

log1

log2

As well as the info list:

infolist

Apologies for all the work, by the way 😅. I didn't expect there'd be something special going on with my system, so I ended up jumping the gun with the MR right after opening the issue. Thanks for taking the time to walk through this thoroughly with me.

sebastinas commented 1 year ago

On GitLab by @lambdcalculus on Jan 3, 2023, 20:29


Actually, I believe I'm onto something. However, I'm not actually very proficient at C, so tell me if I'm Dunning-Kruger'ing myself into oblivion.

From the gdb snippet above, we can see that the call comes after g_thread_new(), which is related to thread creation. Indeed gdb tells us that zathura creates and exits a bunch of threads before crashing, and the crash happens after all the threads exit:

$ gdb --args ./zathura -c .
[...]
Reading symbols from ./zathura...
(gdb) r
Starting program: /home/lambda/src/repos/zathura-0.5.2/build/zathura -c .
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/lib/libthread_db.so.1".
[New Thread 0x7ffff2fff6c0 (LWP 25396)]
[New Thread 0x7ffff27fe6c0 (LWP 25397)]
[New Thread 0x7ffff17fc6c0 (LWP 25399)]
[New Thread 0x7ffff1ffd6c0 (LWP 25398)]
[New Thread 0x7ffff0ffb6c0 (LWP 25400)]
[Thread 0x7ffff0ffb6c0 (LWP 25400) exited]
[New Thread 0x7ffff0ffb6c0 (LWP 25401)]
[New Thread 0x7fffdb5c76c0 (LWP 25402)]
error: could not open plugin directory: /usr/local/lib/zathura
error: Found no plugins. Please install at least one plugin.
[Thread 0x7ffff0ffb6c0 (LWP 25401) exited]
[Thread 0x7fffdb5c76c0 (LWP 25402) exited]
[Thread 0x7ffff17fc6c0 (LWP 25399) exited]
[Thread 0x7ffff5b68980 (LWP 25390) exited]
[Thread 0x7ffff27fe6c0 (LWP 25397) exited]
[Thread 0x7ffff2fff6c0 (LWP 25396) exited]
[Thread 0x7ffff1ffd6c0 (LWP 25398) exited]
[New process 25390]

Program terminated with signal SIGSYS, Bad system call.
The program no longer exists.

And if you go look at the source for g_thread_new() and go down the rabbit hole of C macros and libraries, you'll end up at this file, at the function sysprof_clock_get_current_time(), which does call clock_gettime().

If you're curious about the rabbit hole, g_thread_new() (in this file calls g_thread_new_internal(), which calls something that uses the G_TRACE_CURRENT_TIME macro, which can be found in this file, and it's defined after the SYSPROF_CAPTURE_CURRENT_TIME macro, from another library, which eventually leads to this file, leading to the function mentioned above.

I imagine it's a similar case for gettimeofday, though I haven't found the will to follow the rabbit hole for that one. Either way, it seems something weird happens when the threads exit, and it leads to a bad syscall which the SECCOMP filter doesn't like.

Maybe the threading is not happening on your end, so these syscalls haven't happened. This is just a wild guess though, again I'm not very proficient. If I'm right, though, it seems like either the sandbox mode has to disable multi-threading or enable these syscalls.

sebastinas commented 1 year ago

On GitLab by @valoq on Jan 3, 2023, 20:47


Could you provide the fontcache file? Just compress the fontbackup into a encrypted 7z archive (password=sus) before upload

Even with indicators of compromise present its a false positive 99% of the time so its probably nothing to worry about and most likely just something wrong with the fontconfig setup.

I would recommend you to also remove some of the font packages since they make a giant collection anyway

adobe-source-han-sans-otc-fonts
texlive-local-fontawesome5
ttf-google-fonts-git
ttf-meslo-nerd-font-powerlevel10k
ttf-nerd-fonts-symbols-2048-em

Then please clean both ~/.cache/fontconfig and /var/cache/fontconfig run fc-cache -f -v The output could be interesting as well

And don't worry about my time spend on this, its an interesting issue and I would not have implemented a seccomp filter here if I wasn't prepared for some work ;)

sebastinas commented 1 year ago

On GitLab by @valoq on Jan 3, 2023, 20:51


It may be worth tracing the clock_gettime function call in dgb

instead of catching the syscall, use break clock_gettime and do a backtrace. Stepping into the function may be interesting as well

In my case (and all test systems) it will show:

0x00007ffff6dee6bd in __GI___clock_gettime (clock_id=<optimized out>, tp=<optimized out>) at ../sysdeps/unix/sysv/linux/clock_gettime.c:42

Which points to (in glibc)

 37 #ifdef HAVE_CLOCK_GETTIME64_VSYSCALL
 38   int (*vdso_time64) (clockid_t clock_id, struct __timespec64 *tp)
 39     = GLRO(dl_vdso_clock_gettime64);
 40   if (vdso_time64 != NULL)
 41     {
 42       r = INTERNAL_VSYSCALL_CALL (vdso_time64, 2, clock_id, tp);
 43       if (r == 0)
 44 ▸   return 0;
 45       return INLINE_SYSCALL_ERROR_RETURN_VALUE (-r);
 46     }
 47 #endif
sebastinas commented 1 year ago

On GitLab by @valoq on Jan 3, 2023, 20:55


You could also test this container, which will isolate all font files and throw a fontconfig error which can be ignored here

test

sebastinas commented 1 year ago

On GitLab by @lambdcalculus on Jan 3, 2023, 21:37


Yeah. Some of those I had to install to fix issues that don't need fixing anymore - just never got around to removing them. Here's the fontconfig.

fontconfig.7z

sebastinas commented 1 year ago

On GitLab by @lambdcalculus on Jan 3, 2023, 22:02


I see. I did Dunning-Kruger myself, I had not understood what the underlying issue was, even after you spelled it out :p. Both of our systems call those functions, but on your end you get no syscall

Apologies for the newbie question, but: how do you get the library's source to be there for it to point to? Here it just points to the shared library, which isn't very illuminating.

sebastinas commented 1 year ago

On GitLab by @valoq on Jan 3, 2023, 22:22


The source code can be found here: https://ftp.gnu.org/gnu/glibc/ gdb will just point to the affected file and line

Most system calls are never used by the userspace application code directly but rather through libraries like glibc. The question here is why does glibc chose a different execution path on your system.

It might have something to do with missing time services on your system but thats just a guess. You could try to run systemd-timesyncd since there isn't any time service in your process list anyway

sebastinas commented 1 year ago

On GitLab by @valoq on Jan 4, 2023, 01:51


After investigating the glibc function and vdso/vsyscalls, it appears that your system fails to execute dl_vdso_clock_gettime64 and falls back to the real system call.

Here are some more details:

https://man7.org/linux/man-pages/man7/vdso.7.html

https://stackoverflow.com/questions/19938324/what-are-vdso-and-vsyscall

While we could add a check to the syscall filter and allow the required syscalls when the vdso function is unavailable, it does not seem to be somthing that should be required on any modern system and falling back to the real syscalls is a severe performance penalty, indicating a more severe issue with the kernel.

Since the vdso objects are initialized during boot, the kernel parameters could provide an indicator cat /proc/cmdline

sebastinas commented 1 year ago

On GitLab by @lambdcalculus on Jan 5, 2023, 01:33


Sorry for the delay; had a busy day.

Ah, that seems to be the case. I think this is on the right track, but I don't think the answer's in the kernel parameters:

$ cat /proc/cmdline
BOOT_IMAGE=/boot/vmlinuz-linux root=UUID=d309ca4b-2f30-4959-808b-215150d464ad rw loglevel=3 quiet

However! After some research, I have found this post, which states that certain clocks don't have vDSO support, specfically HPET and ACPI. And what do you know, on my system I get this:

$ cat /sys/devices/system/clocksource/clocksource0/available_clocksource
hpet acpi_pm
$ cat /sys/devices/system/clocksource/clocksource0/current_clocksource 
hpet

The post is kind of old, but if HPET and ACPI clocks still don't have support for vDSO, this could be the answer?

sebastinas commented 1 year ago

On GitLab by @valoq on Jan 5, 2023, 06:07


When I check the setting on my test systems, it will return tsc instead. This seems to be the cause of the fallback to the actual syscall in your case then.

We can probably add a check to see if the vDSO support is available and allow the syscall only when it is not. Since 99.9% wont use it, that would still avoid allowing it for most users.

sebastinas commented 1 year ago

On GitLab by @lambdcalculus on Jan 5, 2023, 15:28


I see. How would that be done? A compile-time check for __vdso_clock_gettime seems to work here, but I don't know if that symbol is supposed to be accessible anyway.

sebastinas commented 1 year ago

On GitLab by @lambdcalculus on Jan 5, 2023, 22:09


Er, looks like the MR was merged before I could add the check. If you want, I can send another one with the check, though I still don't know how to check for vDSO support.

sebastinas commented 1 year ago

On GitLab by @valoq on Jan 5, 2023, 22:24


Yes, the required syscalls are whitelisted for everyone for now.

I would prefer a check to avoid the syscall permission for most users that don't need them and I will look into how to do a runtime check on the availability of the vdsp funtion.

I would appreciate if you could help with some tests once I found a viable solution.

sebastinas commented 1 year ago

On GitLab by @lambdcalculus on Jan 5, 2023, 22:45


Yep, I'll help with tests.

I'll also look into a solution, but I don't expect I'll be able to figure it out. However, does it need to be a run-time check? The X11 vs. Wayland check seems to be done at compile time, and compared to that, not having vDSO support sounds like it's much more rare, so it's probably fine to make it a compile time check as well (if a run-time check is too much a hassle).

sebastinas commented 1 year ago

On GitLab by @valoq on Jan 6, 2023, 01:53


A compile time check would only work on the system the code is compiled on, like that of the arch package manager. If the system supports vdso, it will not allow the syscall and when you run the binary, it will fail again. So it needs to be a runtime check to make sure it works everywhere and even when the support of vdso functions might change on the same system. The X11 vs Wayland checks are also done at runtime.

As for the check, there might even be something in glib to verify if vdso is available.

sebastinas commented 1 year ago

On GitLab by @lambdcalculus on Jan 7, 2023, 24:15


I couldn't find a glib functionality for this, but dlfcn.h seems to do the trick. What is the output of this on your end?

#include <stdio.h>
#include <dlfcn.h>

int main(void) {
    void* rv = dlsym(RTLD_DEFAULT, "__vdso_clock_gettime");

    if (rv == NULL)
        printf("rip\n");
    else
        printf("rop\n");

    return 0;
}

For me it's "rip," which makes sense as __vdso_clock_gettime should not be defined here. Here's the dlsym manpage.

sebastinas commented 1 year ago

On GitLab by @valoq on Jan 7, 2023, 21:15


I have not looked into it in detail yet, but your example also prints out "rip" for me.

sebastinas commented 1 year ago

On GitLab by @valoq on Jan 18, 2023, 11:47


While a simple check for the availability of the vDSO functions would be nice to have in order to avoid allowing teh syscalls when they are not needed, this has a very minor use case and unless there is a very simple solution, it wont be worth it to look into it any further.

I think you can close this unless you want to look into it any further @lambdcalculus

sebastinas commented 1 year ago

On GitLab by @lambdcalculus on Jan 20, 2023, 02:36


I see. Closing then. Sorry again for the trouble 😅

sebastinas commented 1 year ago

On GitLab by @valoq on Jan 20, 2023, 10:23


It was an interesting issue to work on, keep them coming if you find more