Closed sebastinas closed 2 months ago
On GitLab by @valoq on Jan 1, 2023, 01:14
Could you please provide some more details about your environment and what version of zathura you experience this issue with? The zathura-git package in AUR is almost 3 years old.
The current version on arch 0.5.1 as well as the most recent version 0.5.2 seem to work fine with linux 6.0 and wayland on several of my arch linux test systems. What desktop environment are you using and what version of glibc, glib2 and check do you have installed? Which zathura plugins and backends are you using and what version?
Please also provide you zathruarc configuration.
If possible, please test the current version 0.5.2 found here: (which is currently the same as the git version) https://pwmt.org/projects/zathura/download/
after extraction, please run:
meson build && cd build && ninja test
edit: If possible, please also provide a test file that triggers the issue.
On GitLab by @valoq on Jan 1, 2023, 01:41
Here are the updated PKGBUILD files for zathura, girara and the pdf plugins with the current release versions. You will need to install girara first since the latest version is a dependency for zathura 0.5.2 pkgbuild.tar.xz
On GitLab by @lambdcalculus on Jan 1, 2023, 16:04
Those logs were acquired using the binary obtained by cloning and building the develop
branch of this repo manually (same procedure you told me to do with the most recent release), so I used no plugins alongside it. I'm running Hyprland (on Wayland), glibc 2.36-6
, glib2 2.74.4-1
, and check 0.15.2-2
. The zathurarc
I used to obtain the strace
logs only had set sandbox strict
in it.
However, this issue happens with my normal installation of zathura as well, version 0.5.1-1 of the official Arch package. In this case the only plugin I have is zathura-pdf-mupdf 0.3.9-1
. My zathurarc
only has set selection-clipboard clipboard
in it and adding set sandbox strict
causes the issue to happen.
Building 0.5.2 and appending set sandbox strict
to zathurarc
also yields the same issue (no plugins).
tests/test_sandbox.c
from the repo should trigger this issue.
Also, backtracing the clock_gettime
syscall on gdb
seems to indicate the syscall happens due to some GTK thing. Perhaps on your version this syscall doesn't happen?
(gdb) catch syscall clock_gettime
Catchpoint 1 (syscall 'clock_gettime' [228])
(gdb) r
Starting program: /home/lambda/src/repos/zathura-0.5.2/build/zathura
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/lib/libthread_db.so.1".
Catchpoint 1 (call to syscall clock_gettime), 0x00007ffff7fc8a1d in clock_gettime ()
(gdb) backtrace
#0 0x00007ffff7fc8a1d in clock_gettime ()
#1 0x00007ffff6dff6bd in clock_gettime () at /usr/lib/libc.so.6
#2 0x00007ffff715ca13 in () at /usr/lib/libglib-2.0.so.0
#3 0x00007ffff715cd13 in g_thread_new () at /usr/lib/libglib-2.0.so.0
#4 0x00007ffff712c33c in () at /usr/lib/libglib-2.0.so.0
#5 0x00007ffff72bd670 in () at /usr/lib/libgio-2.0.so.0
#6 0x00007ffff72bd725 in g_task_get_type () at /usr/lib/libgio-2.0.so.0
#7 0x00007ffff7327163 in () at /usr/lib/libgio-2.0.so.0
#8 0x00007ffff7327952 in g_dbus_proxy_new_for_bus () at /usr/lib/libgio-2.0.so.0
#9 0x00007ffff7570d54 in () at /usr/lib/libgdk-3.so.0
#10 0x00007ffff7539c37 in gdk_display_manager_open_display () at /usr/lib/libgdk-3.so.0
#11 0x00007ffff77df4a1 in gtk_init_check () at /usr/lib/libgtk-3.so.0
#12 0x00007ffff77df4de in gtk_init () at /usr/lib/libgtk-3.so.0
#13 0x000055555556324b in main (argc=1, argv=0x7fffffffded8) at ../zathura/main.c:279
edit: something I just realized is that running zathura under gdb (again with the i forgot to use set sandbox strict
option) avoids the SIGSYS. Not sure why, or whether that is relevant.--args
-_-
On GitLab by @valoq on Jan 1, 2023, 19:20
Thank you for the report, I will look into it.
If you don't mind, could you provide your environment variables as well?
Also, please try to reproduce this with the following environment set:
export DBUS_SESSION_BUS_ADDRESS=""
export NO_AT_BRIDGE=1
export GTK_A11Y=none
meson build && cd build && ninja test
On GitLab by @lambdcalculus on Jan 1, 2023, 22:08
Environment variables here: https://pastebin.com/2Z8yfRd0.
Also, same issue occurs after setting those environment variables.
On GitLab by @valoq on Jan 2, 2023, 02:37
Just to make sure: The affected system is running arch linux x86_64 with the vanilla kernel and no testing packages or other modifications? Are there any additional modules, especially security modules configured on the system?
This looks like something is trying to influence or measure the process (possibly intended) and since the issue does not trigger when run inside the debugger, this is another indicator as well.
I have just rerun all tests on my VMs and I could not find any system where zathura used any of these system calls nor did I find any other gtk application using this.
My guess is that the use of these syscalls is triggered by an external application or change to a dependency, possibly some debugging tool in your $PATH
There are a few more things you could try to help find the cause of this:
Log all system calls currently used on your patched version and provide the log files:
strace -o log1 -f ./zathura
strace -o log2 -qfc ./zathura
Create a new user and build zathura in that new home environment to make sure there are no custom modifications triggering this issue.
Replace the allow rules in your patch with errno rules and see if this works as well
ERRNO_RULE(clock_gettime);
ERRNO_RULE(gettimeofday);
If none of this helps, you could provide a list of all your installed packages and I will try to rebuild the environment to test this directly.
pacman -Qqe > packagelist
On GitLab by @lambdcalculus on Jan 2, 2023, 19:13
Yes, I'm running Arch Linux x86_64 with the vanilla kernel (though not the most recent), no testing packages. There are a few packages from unofficial repositories (chaotic-aur
and archlinuxcn
) but they are are all simply binaries from packages otherwise available in the AUR I believe. Maybe one or two of them aren't, at most.
Attached are the requested strace
logs. Something I had not paid attention earlier but noticed now is that the gettimeofday
and clock_gettime
syscalls happen a lot, and it's only after many calls that the SIGSYS happens, not after the first. Whatever causes that, ALLOW_RULE
ing those syscalls fixes it. Also, it does crash on gdb, my bad. I was dumb and had forgotten to pass --args
to gdb, so it wasn't passing the zathurarc directory to zathura.
Replacing the ALLOW_RULE
s with ERRNO_RULE
s causes an issue as well:
$ ./zathura -c .
error: could not open plugin directory: /usr/local/lib/zathura
error: Found no plugins. Please install at least one plugin.
(org.pwmt.zathura:32877): GLib-ERROR **: 17:20:52.755: GLib requires working CLOCK_MONOTONIC
fish: Job 1, './zathura -c .' terminated by signal SIGTRAP (Trace or breakpoint trap)
Attached is my package list as well. Some of them are from archlinuxcn
and chaotic-aur
, so you'll need those repositories as well.
edit: Forgot to mention, creating a new user did not work either.
On GitLab by @valoq on Jan 2, 2023, 20:48
This looks indeed like something is trying to measure your zathura process. Before adding these syscalls to the whitelist, we should try to find the source of this to make sure it is a valid use case, especially since its not required by anyone else.
You could try once more with this to make sure there isn't anything on $PATH causing this:
export PATH=/usr/bin
meson build && cd build && ninja test
Another possible hint could be in the output of
sudo lsof -p ZATHURAPID > lsoflog
(needs sudo because the sandbox prevents unprivileged processes from viewing the process stats)
On GitLab by @lambdcalculus on Jan 3, 2023, 01:28
Changing $PATH did not change anything.
For lsof
, I ran zathura without setting the sandbox option, as setting it crashes it immediately and I can't get the pid and ru lsof
. Shouldn't change anything, I believe, since the syscalls still happen when zathura is run normally.
On GitLab by @valoq on Jan 3, 2023, 04:04
/home/lambda/.themes/oomox-red/gtk-3.20/gtk.gresource
This could possibly cause issues, or at least its for an older gtk version. It may be worth checking if moving this theme somewhere else temporarily will change anything, though I doubt it.
One other obvious potential issue is the large number of fontconfig cache files. Font parsing is a topic of its own and actually a common attack vector as well, which makes this a bit suspicious in my somewhat paranoid eyes.
To check this, please move the system fontconfig directory as well as the user fontconfig directory to a temporary backup
mkdir ~/fontbackup
mv /var/cache/fontconfig ~/fontbackup/systemfontconfig
mv ~/.cache/fontconfig ~/fontbackup/userfontconfig
If none of this helps, you could also try to run the official zathura arch package (0.5.1) in this container environment to see if it avoids the issue: https://github.com/valoq/bwscripts/blob/master/profiles/zathura
The root of the issue is with the glibc functions clock_gettime() and gettimeofday() which determine the actual syscalls used. Both libc functions are used on other systems as well but for some reason they trigger these syscalls only on your system.
My best guess for why this is would still be the kernel, therefore it may still help to look at kernel modules as well to make sure. Changes to libc would also explain this
uname -a >> infolist
lsmod >> infolist
pacman -Qi glibc >> infolist
sha256sum /usr/lib/libc.so.6 >> infolist
ps -ef >> infolist
On GitLab by @lambdcalculus on Jan 3, 2023, 19:32
Regarding the theme: I just changed to a different one and yes, no difference.
Moving the fontconfig caches just causes zathura to delay start up a bit, and then it opens a bunch of fontconfig cache files again anyway (and setting the sandbox to strict, as usual, causes a SIGSYS). I'm not particularly well-versed in security, so I don't really know how to investigate this much further, if there is some sort of attack involved with the font cache.
Running zathura on that container environment also leads to the same issue, however straceing
it leads to different results. The syscalls in question do still happen a handful of time though. Here are the logs.
As well as the info list:
Apologies for all the work, by the way 😅. I didn't expect there'd be something special going on with my system, so I ended up jumping the gun with the MR right after opening the issue. Thanks for taking the time to walk through this thoroughly with me.
On GitLab by @lambdcalculus on Jan 3, 2023, 20:29
Actually, I believe I'm onto something. However, I'm not actually very proficient at C, so tell me if I'm Dunning-Kruger'ing myself into oblivion.
From the gdb
snippet above, we can see that the call comes after g_thread_new()
, which is related to thread creation. Indeed gdb
tells us that zathura creates and exits a bunch of threads before crashing, and the crash happens after all the threads exit:
$ gdb --args ./zathura -c .
[...]
Reading symbols from ./zathura...
(gdb) r
Starting program: /home/lambda/src/repos/zathura-0.5.2/build/zathura -c .
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/lib/libthread_db.so.1".
[New Thread 0x7ffff2fff6c0 (LWP 25396)]
[New Thread 0x7ffff27fe6c0 (LWP 25397)]
[New Thread 0x7ffff17fc6c0 (LWP 25399)]
[New Thread 0x7ffff1ffd6c0 (LWP 25398)]
[New Thread 0x7ffff0ffb6c0 (LWP 25400)]
[Thread 0x7ffff0ffb6c0 (LWP 25400) exited]
[New Thread 0x7ffff0ffb6c0 (LWP 25401)]
[New Thread 0x7fffdb5c76c0 (LWP 25402)]
error: could not open plugin directory: /usr/local/lib/zathura
error: Found no plugins. Please install at least one plugin.
[Thread 0x7ffff0ffb6c0 (LWP 25401) exited]
[Thread 0x7fffdb5c76c0 (LWP 25402) exited]
[Thread 0x7ffff17fc6c0 (LWP 25399) exited]
[Thread 0x7ffff5b68980 (LWP 25390) exited]
[Thread 0x7ffff27fe6c0 (LWP 25397) exited]
[Thread 0x7ffff2fff6c0 (LWP 25396) exited]
[Thread 0x7ffff1ffd6c0 (LWP 25398) exited]
[New process 25390]
Program terminated with signal SIGSYS, Bad system call.
The program no longer exists.
And if you go look at the source for g_thread_new()
and go down the rabbit hole of C macros and libraries, you'll end up at this file, at the function sysprof_clock_get_current_time()
, which does call clock_gettime()
.
If you're curious about the rabbit hole, g_thread_new()
(in this file calls g_thread_new_internal()
, which calls something that uses the G_TRACE_CURRENT_TIME
macro, which can be found in this file, and it's defined after the SYSPROF_CAPTURE_CURRENT_TIME
macro, from another library, which eventually leads to this file, leading to the function mentioned above.
I imagine it's a similar case for gettimeofday
, though I haven't found the will to follow the rabbit hole for that one. Either way, it seems something weird happens when the threads exit, and it leads to a bad syscall which the SECCOMP filter doesn't like.
Maybe the threading is not happening on your end, so these syscalls haven't happened. This is just a wild guess though, again I'm not very proficient. If I'm right, though, it seems like either the sandbox mode has to disable multi-threading or enable these syscalls.
On GitLab by @valoq on Jan 3, 2023, 20:47
Could you provide the fontcache file? Just compress the fontbackup into a encrypted 7z archive (password=sus) before upload
Even with indicators of compromise present its a false positive 99% of the time so its probably nothing to worry about and most likely just something wrong with the fontconfig setup.
I would recommend you to also remove some of the font packages since they make a giant collection anyway
adobe-source-han-sans-otc-fonts
texlive-local-fontawesome5
ttf-google-fonts-git
ttf-meslo-nerd-font-powerlevel10k
ttf-nerd-fonts-symbols-2048-em
Then please clean both ~/.cache/fontconfig and /var/cache/fontconfig run
fc-cache -f -v
The output could be interesting as well
And don't worry about my time spend on this, its an interesting issue and I would not have implemented a seccomp filter here if I wasn't prepared for some work ;)
On GitLab by @valoq on Jan 3, 2023, 20:51
It may be worth tracing the clock_gettime function call in dgb
instead of catching the syscall, use break clock_gettime
and do a backtrace.
Stepping into the function may be interesting as well
In my case (and all test systems) it will show:
0x00007ffff6dee6bd in __GI___clock_gettime (clock_id=<optimized out>, tp=<optimized out>) at ../sysdeps/unix/sysv/linux/clock_gettime.c:42
Which points to (in glibc)
37 #ifdef HAVE_CLOCK_GETTIME64_VSYSCALL
38 int (*vdso_time64) (clockid_t clock_id, struct __timespec64 *tp)
39 = GLRO(dl_vdso_clock_gettime64);
40 if (vdso_time64 != NULL)
41 {
42 r = INTERNAL_VSYSCALL_CALL (vdso_time64, 2, clock_id, tp);
43 if (r == 0)
44 ▸ return 0;
45 return INLINE_SYSCALL_ERROR_RETURN_VALUE (-r);
46 }
47 #endif
On GitLab by @valoq on Jan 3, 2023, 20:55
You could also test this container, which will isolate all font files and throw a fontconfig error which can be ignored here
On GitLab by @lambdcalculus on Jan 3, 2023, 21:37
Yeah. Some of those I had to install to fix issues that don't need fixing anymore - just never got around to removing them. Here's the fontconfig.
On GitLab by @lambdcalculus on Jan 3, 2023, 22:02
I see. I did Dunning-Kruger myself, I had not understood what the underlying issue was, even after you spelled it out :p. Both of our systems call those functions, but on your end you get no syscall
Apologies for the newbie question, but: how do you get the library's source to be there for it to point to? Here it just points to the shared library, which isn't very illuminating.
On GitLab by @valoq on Jan 3, 2023, 22:22
The source code can be found here: https://ftp.gnu.org/gnu/glibc/ gdb will just point to the affected file and line
Most system calls are never used by the userspace application code directly but rather through libraries like glibc. The question here is why does glibc chose a different execution path on your system.
It might have something to do with missing time services on your system but thats just a guess. You could try to run systemd-timesyncd since there isn't any time service in your process list anyway
On GitLab by @valoq on Jan 4, 2023, 01:51
After investigating the glibc function and vdso/vsyscalls, it appears that your system fails to execute dl_vdso_clock_gettime64 and falls back to the real system call.
Here are some more details:
https://man7.org/linux/man-pages/man7/vdso.7.html
https://stackoverflow.com/questions/19938324/what-are-vdso-and-vsyscall
While we could add a check to the syscall filter and allow the required syscalls when the vdso function is unavailable, it does not seem to be somthing that should be required on any modern system and falling back to the real syscalls is a severe performance penalty, indicating a more severe issue with the kernel.
Since the vdso objects are initialized during boot, the kernel parameters could provide an indicator
cat /proc/cmdline
On GitLab by @lambdcalculus on Jan 5, 2023, 01:33
Sorry for the delay; had a busy day.
Ah, that seems to be the case. I think this is on the right track, but I don't think the answer's in the kernel parameters:
$ cat /proc/cmdline
BOOT_IMAGE=/boot/vmlinuz-linux root=UUID=d309ca4b-2f30-4959-808b-215150d464ad rw loglevel=3 quiet
However! After some research, I have found this post, which states that certain clocks don't have vDSO support, specfically HPET and ACPI. And what do you know, on my system I get this:
$ cat /sys/devices/system/clocksource/clocksource0/available_clocksource
hpet acpi_pm
$ cat /sys/devices/system/clocksource/clocksource0/current_clocksource
hpet
The post is kind of old, but if HPET and ACPI clocks still don't have support for vDSO, this could be the answer?
On GitLab by @valoq on Jan 5, 2023, 06:07
When I check the setting on my test systems, it will return tsc
instead.
This seems to be the cause of the fallback to the actual syscall in your case then.
We can probably add a check to see if the vDSO support is available and allow the syscall only when it is not. Since 99.9% wont use it, that would still avoid allowing it for most users.
On GitLab by @lambdcalculus on Jan 5, 2023, 15:28
I see. How would that be done? A compile-time check for __vdso_clock_gettime
seems to work here, but I don't know if that symbol is supposed to be accessible anyway.
On GitLab by @lambdcalculus on Jan 5, 2023, 22:09
Er, looks like the MR was merged before I could add the check. If you want, I can send another one with the check, though I still don't know how to check for vDSO support.
On GitLab by @valoq on Jan 5, 2023, 22:24
Yes, the required syscalls are whitelisted for everyone for now.
I would prefer a check to avoid the syscall permission for most users that don't need them and I will look into how to do a runtime check on the availability of the vdsp funtion.
I would appreciate if you could help with some tests once I found a viable solution.
On GitLab by @lambdcalculus on Jan 5, 2023, 22:45
Yep, I'll help with tests.
I'll also look into a solution, but I don't expect I'll be able to figure it out. However, does it need to be a run-time check? The X11 vs. Wayland check seems to be done at compile time, and compared to that, not having vDSO support sounds like it's much more rare, so it's probably fine to make it a compile time check as well (if a run-time check is too much a hassle).
On GitLab by @valoq on Jan 6, 2023, 01:53
A compile time check would only work on the system the code is compiled on, like that of the arch package manager. If the system supports vdso, it will not allow the syscall and when you run the binary, it will fail again. So it needs to be a runtime check to make sure it works everywhere and even when the support of vdso functions might change on the same system. The X11 vs Wayland checks are also done at runtime.
As for the check, there might even be something in glib to verify if vdso is available.
On GitLab by @lambdcalculus on Jan 7, 2023, 24:15
I couldn't find a glib functionality for this, but dlfcn.h
seems to do the trick. What is the output of this on your end?
#include <stdio.h>
#include <dlfcn.h>
int main(void) {
void* rv = dlsym(RTLD_DEFAULT, "__vdso_clock_gettime");
if (rv == NULL)
printf("rip\n");
else
printf("rop\n");
return 0;
}
For me it's "rip," which makes sense as __vdso_clock_gettime
should not be defined here. Here's the dlsym manpage.
On GitLab by @valoq on Jan 7, 2023, 21:15
I have not looked into it in detail yet, but your example also prints out "rip" for me.
On GitLab by @valoq on Jan 18, 2023, 11:47
While a simple check for the availability of the vDSO functions would be nice to have in order to avoid allowing teh syscalls when they are not needed, this has a very minor use case and unless there is a very simple solution, it wont be worth it to look into it any further.
I think you can close this unless you want to look into it any further @lambdcalculus
On GitLab by @lambdcalculus on Jan 20, 2023, 02:36
I see. Closing then. Sorry again for the trouble 😅
On GitLab by @valoq on Jan 20, 2023, 10:23
It was an interesting issue to work on, keep them coming if you find more
On GitLab by @lambdcalculus on Dec 31, 2022, 21:13
Strict sandbox mode disallows the syscalls
gettimeofday
andclock_gettime
, which seem to be necessary during startup (doing:set sandbox strict
after starting works fine), at least on my end. This also causes the AUR packagezathura-git
to fail installation asninja test
fails at the sandbox unit test.Here is the relevant
ninja test
log bit:And relevant
strace
logs (obtained by usingset sandbox strict
inzathurarc
. the second one was obtained after unblocking the first syscall):I'll submit a pull request allowing the two syscalls shortly.