lkrg-org / lkrg

Linux Kernel Runtime Guard
https://lkrg.org
Other
410 stars 72 forks source link

False positive "seccomp filter pointer corruption" on Linux 6.11.0-1-default x86_64 Opensuse tumbleweed #354

Open Laitinlok opened 2 weeks ago

Laitinlok commented 2 weeks ago

Random crashes on boot and constant kernel panic at runtime and unable to reboot system properly when loaded with lkrg on systemd . System shows error with lkrg during reboot when loaded in runtime.

solardiz commented 2 weeks ago

Thank you for reporting this @Laitinlok and sorry LKRG isn't working well for you. Please provide more detail - what architecture, what distro, what kernel build (e.g. specific distro package or whether it's your own build), kernel config. Please try loading LKRG with kINT enforcement disabled, e.g with insmod lkrg.ko kint_enforce=1 and show us what appears in dmesg.

If the problem somehow only shows up when you use the systemd service and enable the service to start at boot, then you can similarly debug this by adding options lkrg kint_enforce=1 to /etc/modprobe.d/lkrg.conf (create it).

Alternatively, you can try putting lkrg.kint_enforce = 1 in /etc/sysctl.d/01-lkrg.conf, which is likely to also do the trick, although it takes effect a tiny bit later (than the /etc/modprobe.d/lkrg.conf way).

Laitinlok commented 2 weeks ago

Thank you for the swift reply, it will try it and report back.

Laitinlok commented 2 weeks ago

Thank you for reporting this @Laitinlok and sorry LKRG isn't working well for you. Please provide more detail - what architecture, what distro, what kernel build (e.g. specific distro package or whether it's your own build), kernel config. Please try loading LKRG with kINT enforcement disabled, e.g with insmod lkrg.ko kint_enforce=1 and show us what appears in dmesg.

If the problem somehow only shows up when you use the systemd service and enable the service to start at boot, then you can similarly debug this by adding options lkrg kint_enforce=1 to /etc/modprobe.d/lkrg.conf (create it).

Alternatively, you can try putting lkrg.kint_enforce = 1 in /etc/sysctl.d/01-lkrg.conf, which is likely to also do the trick, although it takes effect a tiny bit later (than the /etc/modprobe.d/lkrg.conf way).

Opensuse tumbleweed with kernel-default from zypper.

solardiz commented 2 weeks ago

Opensuse tumbleweed with kernel-default from zypper.

We do test on OpenSUSE Tumbleweed here in GitHub Actions, and that test passes. But maybe there's something different in your setup, or maybe it takes longer for the issue to show up.

Are you still planning to provide the additional detail I asked for above? Thank you!

solardiz commented 2 weeks ago

We do test on OpenSUSE Tumbleweed here in GitHub Actions, and that test passes.

Oh, I see the last time it ran (Sep 24) it used 6.10.11-1. Maybe they've updated to 6.11 since. We'll need to re-run the test.

solardiz commented 2 weeks ago

We do test on OpenSUSE Tumbleweed here in GitHub Actions, and that test passes.

Oh, I see the last time it ran (Sep 24) it used 6.10.11-1. Maybe they've updated to 6.11 since. We'll need to re-run the test.

I'm sorry I totally forgot for a moment that it's a build-only test, so it's not supposed to detect this issue. (We do also test boot-up with some other distros.)

So still need more info on this one from you, @Laitinlok.

Laitinlok commented 1 week ago

Yes it can build properly on 6.10.11 with the release tarball, for 6.11 you need to use latest git commit. I have tried lkrg.kint_enforce=1, it does not help.

solardiz commented 1 week ago

Yes it can build properly on 6.10.11 with the release tarball, for 6.11 you need to use latest git commit.

Yes, that's as expected.

I have tried lkrg.kint.enforce=1, it does not help.

How exactly did you try it and how exactly does it not help? Does the kernel still panic? Are you able to capture the relevant kernel messages (as appear in dmesg output) and share them with us here, please? Thank you!

Also, please share the output of uname -mrs (which may tell us a bit more than mere 6.11 - also which arch and build).

Laitinlok commented 1 week ago

Through sysctl. I also isolate the issue is related to lkrg.pint_enforce=2 .

Laitinlok commented 1 week ago

Yes it can build properly on 6.10.11 with the release tarball, for 6.11 you need to use latest git commit.

Yes, that's as expected.

I have tried lkrg.kint.enforce=1, it does not help.

How exactly did you try it and how exactly does it not help? Does the kernel still panic? Are you able to capture the relevant kernel messages (as appear in dmesg output) and share them with us here, please? Thank you!

Also, please share the output of uname -mrs (which may tell us a bit more than mere 6.11 - also which arch and build).

Linux 6.11.0-1-default x86_64

solardiz commented 1 week ago

I also isolate the issue is related to lkrg.pint_enforce=2

Where does pint_enforce=2 come from on your system? Our default is pint_enforce=1.

Can you please run your system for a while with pint_enforce=1 and capture and send us relevant pieces from dmesg, where it presumably would detect a violation (just enforce it more mildly, so the system should stay up)?

Laitinlok commented 1 week ago

I have set it to 2 through sysctl

Laitinlok commented 1 week ago

10月 06 04:31:20 localhost.localdomain kernel: LKRG: ALERT: DETECT: Task: seccomp filter pointer corruption for pid 4778, name tracker-extract 10月 06 04:31:20 localhost.localdomain kernel: LKRG: ALERT: BLOCK: Task: Killing pid 4778, name tracker-extract 10月 06 04:31:20 localhost.localdomain kernel: LKRG: ALERT: DETECT: Task: seccomp filter pointer corruption for pid 4259, name pipewire-pulse 10月 06 04:31:20 localhost.localdomain kernel: LKRG: ALERT: BLOCK: Task: Killing pid 4259, name pipewire-pulse 10月 06 04:31:20 localhost.localdomain kernel: LKRG: ALERT: DETECT: Task: seccomp filter pointer corruption for pid 4259, name pipewire-pulse 10月 06 04:31:20 localhost.localdomain kernel: LKRG: ALERT: BLOCK: Task: Killing pid 4259, name pipewire-pulse 10月 06 04:31:20 localhost.localdomain kernel: LKRG: ALERT: DETECT: Task: seccomp filter pointer corruption for pid 4259, name pipewire-pulse 10月 06 04:31:20 localhost.localdomain kernel: LKRG: ALERT: BLOCK: Task: Killing pid 4259, name pipewire-pulse 10月 06 04:31:20 localhost.localdomain kernel: LKRG: ALERT: DETECT: Task: seccomp filter pointer corruption for pid 4259, name pipewire-pulse 10月 06 04:31:20 localhost.localdomain kernel: LKRG: ALERT: BLOCK: Task: Killing pid 4259, name pipewire-pulse 10月 06 04:31:20 localhost.localdomain kernel: LKRG: ALERT: DETECT: Task: seccomp filter pointer corruption for pid 4259, name pipewire-pulse 10月 06 04:31:20 localhost.localdomain kernel: LKRG: ALERT: BLOCK: Task: Killing pid 4259, name pipewire-pulse 10月 06 04:31:20 localhost.localdomain kernel: LKRG: ALERT: DETECT: Task: seccomp filter pointer corruption for pid 4259, name pipewire-pulse 10月 06 04:31:20 localhost.localdomain kernel: LKRG: ALERT: BLOCK: Task: Killing pid 4259, name pipewire-pulse warning from lkrg when shutting down.

solardiz commented 1 week ago

Thank you, this helps.

Do I understand correctly that you were previously using "6.10.11 with the release tarball" and it didn't exhibit the issue?

Laitinlok commented 1 week ago

Thank you, this helps.

Do I understand correctly that you were previously using "6.10.11 with the release tarball" and it didn't exhibit the issue?

It started having issues in 6.10.7 I think.

solardiz commented 1 week ago

It started having issues in 6.10.7 I think.

That's puzzling. When issues started, did you upgrade only the kernel or also LKRG? Were those the same issues (the seccomp filter pointer corruption message) or something else?

solardiz commented 1 week ago

@Adam-pi3 It sounds like your reasoning in #346 could have been flawed. As seen from code snippets in #338, what changed with 38b3b11cf6f4f24fc82997a768082d05890ffbb8 for 5.9+ is that previously we increased refcount for filter->users and filter->refs, and now we do only for filter->refs. Per your comments in #346, none of this should have been needed, and we only do it as defensive programming to reduce impact of a possible misunderstanding from a use-after-free to a resource leak. Yet the impact we see looks like a use-after-free by our own code, so maybe the filter->users increase was somehow required to keep the filter from disappearing/changing under us (if this is indeed a new problem with this change, which isn't entirely clear)?

Anyway, I am really tempted to do what I had suggested earlier - exclude seccomp checks on 5.9+. I think they're also incomplete anyway, checking only the first out of possible multiple filters. Is this OK with you? We haven't seen real-world exploits that would modify only seccomp and not anything else we track, have we? However, we have seen plenty of issues related to LKRG's seccomp tracking support on 5.9+, where we had to use risky hacks to get around Linux's symbol non-export. So I feel this feature has poor balance of benefit vs. risk as currently implemented, and we do not readily have an obviously better idea.

Strykar commented 1 week ago

When I reboot, I see kernel: LKRG: ALERT: DETECT: Task: seccomp filter pointer corruption for pid for pipewire and mympd. I am also seeing this on kernel Linux r912 6.11.2-arch1-1 #1 SMP PREEMPT_DYNAMIC Fri, 04 Oct 2024 21:51:11 +0000 x86_64 GNU/Linux:

sudo dmesg | grep -i lkrg
[   11.437009] LKRG: ALIVE: Loading LKRG
[   11.598059] LKRG: ISSUE: [kretprobe] register_kretprobe() for <ovl_dentry_is_whiteout> failed! [err=-2]
[   11.598061] LKRG: ISSUE: Can't hook 'ovl_dentry_is_whiteout'. This is expected when OverlayFS is not used
[   11.723268] LKRG: ALIVE: LKRG initialized successfully
[  119.812548] LKRG: ALERT: DETECT: Task: seccomp filter pointer corruption for pid 5531, name gmain
[  119.812554] LKRG: ALERT: BLOCK: Task: Killing pid 5531, name gmain
[  119.813312] LKRG: ALERT: DETECT: Task: seccomp filter pointer corruption for pid 5527, name bwrap
[  119.813316] LKRG: ALERT: BLOCK: Task: Killing pid 5527, name bwrap

[   52.657022] LKRG: ALERT: DETECT: Task: seccomp filter pointer corruption for pid 2047, name pool-spawner
[   52.657027] LKRG: ALERT: BLOCK: Task: Killing pid 2047, name pool-spawner
[   52.657031] LKRG: ALERT: DETECT: Task: seccomp filter pointer corruption for pid 2047, name pool-spawner
[   52.657032] LKRG: ALERT: BLOCK: Task: Killing pid 2047, name pool-spawner
[   52.657034] LKRG: ALERT: DETECT: Task: seccomp filter pointer corruption for pid 2047, name pool-spawner
[   52.657035] LKRG: ALERT: BLOCK: Task: Killing pid 2047, name pool-spawner
[   52.658779] LKRG: ALERT: DETECT: Task: seccomp filter pointer corruption for pid 2025, name pipewire
[   52.658783] LKRG: ALERT: BLOCK: Task: Killing pid 2025, name pipewire
[   52.658788] LKRG: ALERT: DETECT: Task: seccomp filter pointer corruption for pid 2025, name pipewire
[   52.658789] LKRG: ALERT: BLOCK: Task: Killing pid 2025, name pipewire
[   52.658791] LKRG: ALERT: DETECT: Task: seccomp filter pointer corruption for pid 2025, name pipewire
[   52.658792] LKRG: ALERT: BLOCK: Task: Killing pid 2025, name pipewire
[   52.658794] LKRG: ALERT: DETECT: Task: seccomp filter pointer corruption for pid 2025, name pipewire
[   52.658795] LKRG: ALERT: BLOCK: Task: Killing pid 2025, name pipewire
[   52.658796] LKRG: ALERT: DETECT: Task: seccomp filter pointer corruption for pid 2025, name pipewire
[   52.658797] LKRG: ALERT: BLOCK: Task: Killing pid 2025, name pipewire
[   52.658799] LKRG: ALERT: DETECT: Task: seccomp filter pointer corruption for pid 2025, name pipewire
[   52.658800] LKRG: ALERT: BLOCK: Task: Killing pid 2025, name pipewire

[  255.571156] LKRG: ALERT: BLOCK: Task: Killing pid 19675, name [vkrt] Analysis
[  255.571158] LKRG: ALERT: BLOCK: Task: Killing pid 19652, name pool-org.gnome.
[  255.571158] LKRG: ALERT: DETECT: Task: 'off' flag corruption for pid 19653, name pool-spawner
[  255.571160] LKRG: ALERT: BLOCK: Task: Killing pid 19653, name pool-org.gnome.
[  255.571162] LKRG: ALERT: DETECT: Task: 'off' flag corruption for pid 19680, name nautilus
[  255.571163] LKRG: ALERT: DETECT: Task: 'off' flag corruption for pid 19654, name pool-org.gnome.
[  255.571164] LKRG: ALERT: BLOCK: Task: Killing pid 19680, name nautilus

I can't even find a binary named gmain:

plocate gmain
/usr/include/at-spi-2.0/atspi/atspi-gmain.h
/usr/include/glib-2.0/glib/gmain.h
/usr/include/glib-2.0/glib/deprecated/gmain.h
/usr/share/texmf-dist/fonts/source/public/elvish/tengmain.mf
/work/x86_64/airootfs/usr/include/glib-2.0/glib/gmain.h
/work/x86_64/airootfs/usr/include/glib-2.0/glib/deprecated/gmain.h

bwrap appears to be part of bubblewrap and required by multiple gnome packages on Arch linux:

$ pacwho /usr/bin/bwrap
/usr/bin/bwrap is owned by bubblewrap 0.10.0-1

This is the default config on arch:

sudo sysctl -a | grep lkrg
lkrg.block_modules = 0
lkrg.heartbeat = 0
lkrg.hide = 0
lkrg.interval = 15
lkrg.kint_enforce = 2
lkrg.kint_validate = 3
lkrg.log_level = 3
lkrg.msr_validate = 0
lkrg.pcfi_enforce = 1
lkrg.pcfi_validate = 2
lkrg.pint_enforce = 1
lkrg.pint_validate = 1
lkrg.profile_enforce = 2
lkrg.profile_validate = 3
lkrg.smap_enforce = 2
lkrg.smap_validate = 1
lkrg.smep_enforce = 2
lkrg.smep_validate = 1
lkrg.trigger = 0
lkrg.umh_enforce = 1
lkrg.umh_validate = 1

Please let me know if I should open a separate issue instead.

solardiz commented 1 week ago

Thank you for reporting this @Strykar! Looks like the same issue to me, so let's keep the info in here.

@Adam-pi3 I think we need to look for possible seccomp-related changes between 6.10 and 6.11 to see if we possibly miss tracking some new legitimate seccomp filter pointer updates. This issue appears too frequently for it to be likely a race condition.

solardiz commented 1 week ago

I think we need to look for possible seccomp-related changes between 6.10 and 6.11 to see if we possibly miss tracking some new legitimate seccomp filter pointer updates.

I searched commit messages for mentions of seccomp. Didn't find any new legitimate updates, but found this:

commit bfafe5efa9754ebc991750da0bcca2a6694f3ed3
Author: Andrei Vagin <avagin@google.com>
Date:   Fri Jun 28 02:10:12 2024 +0000

    seccomp: release task filters when the task exits

    Previously, seccomp filters were released in release_task(), which
    required the process to exit and its zombie to be collected. However,
    exited threads/processes can't trigger any seccomp events, making it
    more logical to release filters upon task exits.

    This adjustment simplifies scenarios where a parent is tracing its child
    process. The parent process can now handle all events from a seccomp
    listening descriptor and then call wait to collect a child zombie.

    seccomp_filter_release takes the siglock to avoid races with
    seccomp_sync_threads. There was an idea to bypass taking the lock by
    checking PF_EXITING, but it can be set without holding siglock if
    threads have SIGNAL_GROUP_EXIT. This means it can happen concurently
    with seccomp_filter_release.

    This change also fixes another minor problem. Suppose that a group
    leader installs the new filter without SECCOMP_FILTER_FLAG_TSYNC, exits,
    and becomes a zombie. Without this change, SECCOMP_FILTER_FLAG_TSYNC
    from any other thread can never succeed, seccomp_can_sync_threads() will
    check a zombie leader and is_ancestor() will fail.

    Reviewed-by: Oleg Nesterov <oleg@redhat.com>
    Signed-off-by: Andrei Vagin <avagin@google.com>
    Link: https://lore.kernel.org/r/20240628021014.231976-3-avagin@google.com
    Reviewed-by: Tycho Andersen <tandersen@netflix.com>
    Signed-off-by: Kees Cook <kees@kernel.org>

Maybe this created or exposed (made more likely) a race condition?

Laitinlok commented 1 week ago

Thank you for reporting this @Strykar! Looks like the same issue to me, so let's keep the info in here.

@Adam-pi3 I think we need to look for possible seccomp-related changes between 6.10 and 6.11 to see if we possibly miss tracking some new legitimate seccomp filter pointer updates. This issue appears too frequently for it to be likely a race condition.

Yes I also experienced the same problem in the logs every time with different binaries, seems to be a false positive.

Laitinlok commented 1 week ago

Edit by @solardiz: dropped the over-quoting

I searched commit messages for mentions of seccomp. Didn't find any new legitimate updates, but found this:

https://github.com/openSUSE/kernel/tree/v6.11.2 You might be more lucky funding the commit from the distro tree.

Kirkezz commented 6 days ago

I got alerts about "seccomp filter pointer corruption" recently too. https://pastebin.com/qwfaU2MZ 6.10.12-hardened Arch Linux lkrg 0.9.8-1

solardiz commented 5 days ago

Thank you @Kirkezz. The mainline commit I found above is also included in 6.10.10+, so your report does not exclude the potential that the issue is related to that commit.

https://cdn.kernel.org/pub/linux/kernel/v6.x/ChangeLog-6.10.10

Kirkezz commented 5 days ago

Yes, you are most likely right. I found earlier logs in journalctl with this problem, and the linux version in those logs is 6.10.10.

Adam-pi3 commented 4 days ago

I installed OpenSUSE Tumbleweed Desktop and server version as my VmWare VMs and none of them has the issue which you are describing:

localhost:~/lkrg # cat /etc/os-release 
NAME="openSUSE Tumbleweed"
# VERSION="20241011"
ID="opensuse-tumbleweed"
ID_LIKE="opensuse suse"
VERSION_ID="20241011"
PRETTY_NAME="openSUSE Tumbleweed"
ANSI_COLOR="0;32"
# CPE 2.3 format, boo#1217921
CPE_NAME="cpe:2.3:o:opensuse:tumbleweed:20241011:*:*:*:*:*:*:*"
#CPE 2.2 format
#CPE_NAME="cpe:/o:opensuse:tumbleweed:20241011"
BUG_REPORT_URL="https://bugzilla.opensuse.org"
SUPPORT_URL="https://bugs.opensuse.org"
HOME_URL="https://www.opensuse.org"
DOCUMENTATION_URL="https://en.opensuse.org/Portal:Tumbleweed"
LOGO="distributor-logo-Tumbleweed"
localhost:~/lkrg # uname -a
Linux localhost.localdomain 6.11.2-1-default #1 SMP PREEMPT_DYNAMIC Fri Oct  4 17:37:58 UTC 2024 (38c846e) x86_64 x86_64 x86_64 GNU/Linux
localhost:~/lkrg # 

I assume there is some more to this problem than just kernel version. Did you compile it by yourself? Did you need to do anything specific to see the issue? I browse the internet through Firefox on Desktop VM and I didn't see any problem with LKRG:

localhost:~/lkrg # dmesg -T|tail -10
[Sat Oct 12 14:46:38 2024] [  T64502] Freezing user space processes completed (elapsed 0.005 seconds)
[Sat Oct 12 14:46:38 2024] [  T64502] OOM killer disabled.
[Sat Oct 12 14:46:38 2024] [  T64502] LKRG: ISSUE: [kretprobe] register_kretprobe() for <ovl_dentry_is_whiteout> failed! [err=-2]
[Sat Oct 12 14:46:38 2024] [  T64502] LKRG: ISSUE: Can't hook 'ovl_dentry_is_whiteout'. This is expected when OverlayFS is not used.
[Sat Oct 12 14:46:38 2024] [  T64502] LKRG: ALIVE: LKRG initialized successfully
[Sat Oct 12 14:46:38 2024] [  T64502] OOM killer enabled.
[Sat Oct 12 14:46:38 2024] [  T64502] Restarting tasks ... done.
[Sat Oct 12 14:51:13 2024] [  T65282] LKRG: STATE: Enabling 'heartbeat'
[Sat Oct 12 14:51:15 2024] [  T64590] LKRG: ALIVE: System is clean
[Sat Oct 12 14:51:30 2024] [  T64590] LKRG: ALIVE: System is clean
localhost:~/lkrg # 
Laitinlok commented 4 days ago

Edit by @solardiz: dropped the over-quoting

I installed OpenSUSE Tumbleweed Desktop and server version as my VmWare VMs and none of them has the issue which you are describing

Are you using dkms?

Adam-pi3 commented 3 days ago

No, I didn't use dkms because i fetch the git repo, compile it and loaded LKRG after the system was booted.

solardiz commented 3 days ago

I doubt that DKMS is relevant, but loading of LKRG early may be. @Adam-pi3 you could want to try our make install followed by systemctl enable lkrg and reboot. BTW, I don't know if we ever tested this on SUSE - would be a good test on its own.

Adam-pi3 commented 3 days ago

@solardiz I did try that and still there is no issue (in neither VMs - server and Desktop).

BTW, I don't know if we ever tested this on SUSE - would be a good test on its own.

I do occasionally test on OpenSUSE started from Leap release (I believe 15.2)

Kirkezz commented 3 days ago

Are you using dkms?

I personally use lkrg-dkms from the AUR and the bug is reproducible. Not sure how to localize the problem.

solardiz commented 2 days ago

@Adam-pi3 How many vCPUs do you have in those VMs? Maybe you need to assign more to expose concurrency issues.

solardiz commented 2 days ago

I got alerts about "seccomp filter pointer corruption" recently too. https://pastebin.com/qwfaU2MZ

Let's record this right in here:

19:21:03 kernel: LKRG: ALIVE: LKRG initialized successfully
20:11:13 kernel: LKRG: ALERT: DETECT: Task: seccomp filter pointer corruption for pid 12466, name Worker Launcher
20:11:13 kernel: LKRG: ALERT: BLOCK: Task: Killing pid 12466, name Worker Launcher
20:12:02 kernel: LKRG: ALERT: DETECT: Task: seccomp filter pointer corruption for pid 12592, name StreamTrans #6
20:12:02 kernel: LKRG: ALERT: BLOCK: Task: Killing pid 12592, name StreamTrans #6
20:12:08 kernel: LKRG: ALERT: DETECT: Task: seccomp filter pointer corruption for pid 1878, name Isolated Web Co
20:12:08 kernel: LKRG: ALERT: BLOCK: Task: Killing pid 1878, name Isolated Web Co
20:12:08 kernel: LKRG: ALERT: DETECT: Task: seccomp filter pointer corruption for pid 1941, name HTML5 Parser
20:12:08 kernel: LKRG: ALERT: BLOCK: Task: Killing pid 1941, name HTML5 Parser
20:12:08 kernel: LKRG: ALERT: DETECT: Task: seccomp filter pointer corruption for pid 1941, name HTML5 Parser
20:12:08 kernel: LKRG: ALERT: BLOCK: Task: Killing pid 1941, name HTML5 Parser
20:12:08 kernel: LKRG: ALERT: DETECT: Task: seccomp filter pointer corruption for pid 1941, name HTML5 Parser
20:12:08 kernel: LKRG: ALERT: BLOCK: Task: Killing pid 1941, name HTML5 Parser
20:12:08 kernel: LKRG: ALERT: DETECT: Task: seccomp filter pointer corruption for pid 1941, name HTML5 Parser
20:12:08 kernel: LKRG: ALERT: BLOCK: Task: Killing pid 1941, name HTML5 Parser
20:12:08 kernel: LKRG: ALERT: DETECT: Task: seccomp filter pointer corruption for pid 1835, name StyleThread#1
20:12:08 kernel: LKRG: ALERT: BLOCK: Task: Killing pid 1835, name StyleThread#1
20:12:08 kernel: LKRG: ALERT: DETECT: Task: seccomp filter pointer corruption for pid 1864, name ImageBridgeChld
20:12:08 kernel: LKRG: ALERT: BLOCK: Task: Killing pid 1864, name ImageBridgeChld
20:12:08 kernel: LKRG: ALERT: DETECT: Task: seccomp filter pointer corruption for pid 1928, name JS Watchdog
20:12:08 kernel: LKRG: ALERT: BLOCK: Task: Killing pid 1928, name JS Watchdog
20:12:08 kernel: LKRG: ALERT: DETECT: Task: seccomp filter pointer corruption for pid 2170, name MainThread
20:12:08 kernel: LKRG: ALERT: BLOCK: Task: Killing pid 2170, name MainThread
20:12:08 kernel: LKRG: ALERT: DETECT: Task: seccomp filter pointer corruption for pid 2077, name ImageIO
20:12:08 kernel: LKRG: ALERT: BLOCK: Task: Killing pid 2077, name ImageIO
20:12:08 kernel: LKRG: ALERT: DETECT: Task: seccomp filter pointer corruption for pid 6767, name MainThread
20:12:08 kernel: LKRG: ALERT: BLOCK: Task: Killing pid 6767, name MainThread
09:25:59 kernel: LKRG: ALERT: DETECT: Task: seccomp filter pointer corruption for pid 26475, name ImageBridgeChld
09:25:59 kernel: LKRG: ALERT: BLOCK: Task: Killing pid 26475, name ImageBridgeChld
10:05:02 kernel: LKRG: ALERT: DETECT: Task: seccomp filter pointer corruption for pid 29154, name TaskCon~ller #0
10:05:02 kernel: LKRG: ALERT: BLOCK: Task: Killing pid 29154, name TaskCon~ller #0
10:05:02 kernel: LKRG: ALERT: DETECT: Task: seccomp filter pointer corruption for pid 29153, name TaskCon~ller #3
10:05:02 kernel: LKRG: ALERT: BLOCK: Task: Killing pid 29153, name TaskCon~ller #3
10:14:22 kernel: LKRG: ALERT: DETECT: Task: seccomp filter pointer corruption for pid 29503, name RemoteLzyStream
10:14:22 kernel: LKRG: ALERT: BLOCK: Task: Killing pid 29503, name RemoteLzyStream
10:14:22 kernel: LKRG: ALERT: DETECT: Task: seccomp filter pointer corruption for pid 29455, name Web Content
10:14:22 kernel: LKRG: ALERT: BLOCK: Task: Killing pid 29455, name Web Content
10:22:47 kernel: LKRG: ALERT: DETECT: Task: seccomp filter pointer corruption for pid 29832, name JS Watchdog
10:22:48 kernel: LKRG: ALERT: BLOCK: Task: Killing pid 29832, name JS Watchdog
10:53:31 kernel: LKRG: ALERT: DETECT: Task: seccomp filter pointer corruption for pid 31035, name StyleThread#1
10:53:31 kernel: LKRG: ALERT: BLOCK: Task: Killing pid 31035, name StyleThread#1
10:59:53 kernel: LKRG: ALERT: DETECT: Task: seccomp filter pointer corruption for pid 31327, name Socket Thread
10:59:53 kernel: LKRG: ALERT: BLOCK: Task: Killing pid 31327, name Socket Thread
10:59:53 kernel: LKRG: ALERT: DETECT: Task: seccomp filter pointer corruption for pid 31379, name StreamTrans #1
10:59:53 kernel: LKRG: ALERT: BLOCK: Task: Killing pid 31379, name StreamTrans #1
13:06:13 kernel: LKRG: ALERT: DETECT: Task: seccomp filter pointer corruption for pid 40397, name Timer
13:06:13 kernel: LKRG: ALERT: BLOCK: Task: Killing pid 40397, name Timer
13:06:16 kernel: LKRG: ALERT: DETECT: Task: seccomp filter pointer corruption for pid 40569, name RemoteLzyStream
13:06:16 kernel: LKRG: ALERT: BLOCK: Task: Killing pid 40569, name RemoteLzyStream
13:06:16 kernel: LKRG: ALERT: DETECT: Task: seccomp filter pointer corruption for pid 40559, name Socket Thread
13:06:17 kernel: LKRG: ALERT: BLOCK: Task: Killing pid 40559, name Socket Thread
13:13:11 kernel: LKRG: ALERT: DETECT: Task: seccomp filter pointer corruption for pid 19120, name StyleThread#2
13:13:11 kernel: LKRG: ALERT: BLOCK: Task: Killing pid 19120, name StyleThread#2
13:13:11 kernel: LKRG: ALERT: DETECT: Task: seccomp filter pointer corruption for pid 19127, name Timer
13:13:11 kernel: LKRG: ALERT: BLOCK: Task: Killing pid 19127, name Timer
13:13:12 kernel: LKRG: ALERT: DETECT: Task: seccomp filter pointer corruption for pid 19129, name ImageIO
13:13:12 kernel: LKRG: ALERT: BLOCK: Task: Killing pid 19129, name ImageIO
13:13:12 kernel: LKRG: ALERT: DETECT: Task: seccomp filter pointer corruption for pid 19129, name ImageIO
13:13:12 kernel: LKRG: ALERT: BLOCK: Task: Killing pid 19129, name ImageIO
13:13:12 kernel: LKRG: ALERT: DETECT: Task: seccomp filter pointer corruption for pid 19129, name ImageIO
13:13:12 kernel: LKRG: ALERT: BLOCK: Task: Killing pid 19129, name ImageIO
13:13:12 kernel: LKRG: ALERT: DETECT: Task: seccomp filter pointer corruption for pid 19129, name ImageIO
13:13:12 kernel: LKRG: ALERT: BLOCK: Task: Killing pid 19129, name ImageIO
13:13:12 kernel: LKRG: ALERT: DETECT: Task: seccomp filter pointer corruption for pid 19129, name ImageIO
13:13:12 kernel: LKRG: ALERT: BLOCK: Task: Killing pid 19129, name ImageIO
13:13:12 kernel: LKRG: ALERT: DETECT: Task: seccomp filter pointer corruption for pid 41545, name StreamTrans #68
13:13:12 kernel: LKRG: ALERT: BLOCK: Task: Killing pid 41545, name StreamTrans #68
13:13:12 kernel: LKRG: ALERT: DETECT: Task: seccomp filter pointer corruption for pid 18999, name ProcessHangMon
13:13:12 kernel: LKRG: ALERT: BLOCK: Task: Killing pid 18999, name ProcessHangMon
13:13:12 kernel: LKRG: ALERT: DETECT: Task: seccomp filter pointer corruption for pid 19044, name Socket Thread
13:13:12 kernel: LKRG: ALERT: BLOCK: Task: Killing pid 19044, name Socket Thread
13:13:12 kernel: LKRG: ALERT: DETECT: Task: seccomp filter pointer corruption for pid 19062, name ImageBridgeChld
13:13:12 kernel: LKRG: ALERT: BLOCK: Task: Killing pid 19062, name ImageBridgeChld
13:13:12 kernel: LKRG: ALERT: DETECT: Task: seccomp filter pointer corruption for pid 19062, name ImageBridgeChld
13:13:12 kernel: LKRG: ALERT: BLOCK: Task: Killing pid 19062, name ImageBridgeChld
13:13:12 kernel: LKRG: ALERT: DETECT: Task: seccomp filter pointer corruption for pid 19062, name ImageBridgeChld
13:13:12 kernel: LKRG: ALERT: BLOCK: Task: Killing pid 19062, name ImageBridgeChld
13:13:12 kernel: LKRG: ALERT: DETECT: Task: seccomp filter pointer corruption for pid 19037, name HTML5 Parser
13:13:12 kernel: LKRG: ALERT: BLOCK: Task: Killing pid 19037, name HTML5 Parser
13:13:12 kernel: LKRG: ALERT: DETECT: Task: seccomp filter pointer corruption for pid 19266, name ImageIO
13:13:12 kernel: LKRG: ALERT: BLOCK: Task: Killing pid 19266, name ImageIO
13:13:12 kernel: LKRG: ALERT: DETECT: Task: seccomp filter pointer corruption for pid 19322, name TaskCon~ller #2
13:13:12 kernel: LKRG: ALERT: BLOCK: Task: Killing pid 19322, name TaskCon~ller #2
13:13:12 kernel: LKRG: ALERT: DETECT: Task: seccomp filter pointer corruption for pid 19178, name StyleThread#1
13:13:12 kernel: LKRG: ALERT: BLOCK: Task: Killing pid 19178, name StyleThread#1
13:13:12 kernel: LKRG: ALERT: DETECT: Task: seccomp filter pointer corruption for pid 19175, name IPC I/O Child
13:13:12 kernel: LKRG: ALERT: BLOCK: Task: Killing pid 19175, name IPC I/O Child
13:13:12 kernel: LKRG: ALERT: DETECT: Task: seccomp filter pointer corruption for pid 35035, name TaskCon~ller #2
13:13:12 kernel: LKRG: ALERT: BLOCK: Task: Killing pid 35035, name TaskCon~ller #2
13:13:13 kernel: LKRG: ALERT: DETECT: Task: seccomp filter pointer corruption for pid 41432, name ImageBridgeChld
13:13:13 kernel: LKRG: ALERT: BLOCK: Task: Killing pid 41432, name ImageBridgeChld

6.10.12-hardened Arch Linux lkrg 0.9.8-1

In a comment above addressed to Adam, I wrote:

maybe the filter->users increase was somehow required to keep the filter from disappearing/changing under us (if this is indeed a new problem with this change, which isn't entirely clear)?

The fact that the problem is also seen on a build of 0.9.8 suggests that no, this is not "a new problem with this change".

So until we hopefully figure it out for real, maybe let's exclude seccomp monitoring on 6.10.10+? I think we need make a release real soon, for the 6.11+ support, but this issue delays that. What do you think, @Adam-pi3?

Adam-pi3 commented 2 days ago

I left the boxes for a few days and didn't see that issue. I'm wonder, @Kirkezz and @Laitinlok can you try to pull the LKRG from the official github, compile it and load instead of the one which you have pre-installed from DKMS and check if you see the same problem?

That being said, the kernel version (hardened) is not standard, heavily modified and aggressively inlining a lot of functions (that's why we have script of exporting some of the symbols for in-tree kernel compilation but also for custom kernels) - maybe the problem is there.

@solardiz I would prefer not, let's try to find out first what's going on (I don't think the problem is related to LKRG itself)

solardiz commented 2 days ago

That being said, the kernel version (hardened) is not standard, heavily modified and aggressively inlining a lot of functions (that's why we have script of exporting some of the symbols for in-tree kernel compilation but also for custom kernels) - maybe the problem is there.

hardened is only seen in 1 out of 3 independent reports here.

@solardiz I would prefer not, let's try to find out first what's going on (I don't think the problem is related to LKRG itself)

Even if the problem isn't in LKRG itself, it's clearly related to LKRG, and it's clearly (more) exposed by recent changes somewhere (I guess that kernel commit I found). We got 3 reports in 2 weeks here vs. no reports of this exact issue before.

Laitinlok commented 2 days ago

I left the boxes for a few days and didn't see that issue. I'm wonder, @Kirkezz and @Laitinlok can you try to pull the LKRG from the official github, compile it and load instead of the one which you have pre-installed from DKMS and check if you see the same problem?

That being said, the kernel version (hardened) is not standard, heavily modified and aggressively inlining a lot of functions (that's why we have script of exporting some of the symbols for in-tree kernel compilation but also for custom kernels) - maybe the problem is there.

@solardiz I would prefer not, let's try to find out first what's going on (I don't think the problem is related to LKRG itself)

I have already cloned the repo, and then soft link to /usr/stc and then add it to dkms.

Laitinlok commented 2 days ago

I left the boxes for a few days and didn't see that issue. I'm wonder, @Kirkezz and @Laitinlok can you try to pull the LKRG from the official github, compile it and load instead of the one which you have pre-installed from DKMS and check if you see the same problem?

That being said, the kernel version (hardened) is not standard, heavily modified and aggressively inlining a lot of functions (that's why we have script of exporting some of the symbols for in-tree kernel compilation but also for custom kernels) - maybe the problem is there.

@solardiz I would prefer not, let's try to find out first what's going on (I don't think the problem is related to LKRG itself)

The hardened kernel is for Arch Linux, mine is the default kernel from opensuse tumbleweed.

Adam-pi3 commented 2 days ago

Thanks @Laitinlok for the reply. Just to double confirm, do you have other LKRG version in your setup or just the one from the repo to be sure the latest compilation is being loaded?

Do you happened to know how I could repro the issue? Do you execute any specific action to cause the issue?

Laitinlok commented 2 days ago

Thanks @Laitinlok for the reply. Just to double confirm, do you have other LKRG version in your setup or just the one from the repo to be sure the latest compilation is being loaded?

Do you happened to know how I could repro the issue? Do you execute any specific action to cause the issue?

Yes only the one from the repo.

Adam-pi3 commented 1 day ago

@Laitinlok Do you happened to know how I could repro the issue? Do you execute any specific action to cause the issue?

Laitinlok commented 1 day ago

@Laitinlok Do you happened to know how I could repro the issue? Do you execute any specific action to cause the issue?

sudo systemctl enable --now lkrg, restart 2 times.

Laitinlok commented 1 day ago

@Laitinlok Do you happened to know how I could repro the issue? Do you execute any specific action to cause the issue?

Do you have secure boot and trusted boot enabled.

Adam-pi3 commented 1 day ago

Certainly it doesn't repro on my side. @Laitinlok can you try LKRG under newest SUSE kernel 6.11.2-1-default and check if you see the same issue?

Do you have secure boot and trusted boot enabled.

I do not (it's under VM emulating BIOS)

Laitinlok commented 1 day ago

Certainly it doesn't repro on my side. @Laitinlok can you try LKRG under newest SUSE kernel 6.11.2-1-default and check if you see the same issue?

Do you have secure boot and trusted boot enabled.

I do not (it's under VM emulating BIOS)

It has the same issues with the latest kernel.

solardiz commented 20 hours ago

@Adam-pi3 What would your next steps be if you were able to reproduce the issue? Maybe we can jump to those right away.

Adam-pi3 commented 8 hours ago

@Laitinlok can you change the log.level to level 4 ( I would like to see the actual value of the pointers ). You can do it via cli: sysctl lkrg.log_level=4 Additionally, can you also apply this small patch to LKRG?

diff --git a/src/modules/exploit_detection/p_exploit_detection.c b/src/modules/exploit_detection/p_exploit_detection.c
index 69db274..3fc8fa5 100644
--- a/src/modules/exploit_detection/p_exploit_detection.c
+++ b/src/modules/exploit_detection/p_exploit_detection.c
@@ -1245,6 +1245,7 @@ static int p_cmp_creds(struct p_cred *p_orig, const struct cred *p_current_cred,

 #define P_CMP_PTR(orig, curr, name) \
    if (orig != curr) { \
+      printk(KERN_CRIT "p_ret[%d] test_task_syscall_work=%d",p_ret,test_task_syscall_work(p_current, SECCOMP)); \
       if (p_opt) { \
          if (P_CTRL(p_log_level) >= P_LOG_WATCH) \
             p_print_log(P_LOG_ALERT, \
Laitinlok commented 6 hours ago

@Laitinlok can you change the log.level to level 4 ( I would like to see the actual value of the pointers ). You can do it via cli:

sysctl lkrg.log_level=4

Additionally, can you also apply this small patch to LKRG?


diff --git a/src/modules/exploit_detection/p_exploit_detection.c b/src/modules/exploit_detection/p_exploit_detection.c

index 69db274..3fc8fa5 100644

--- a/src/modules/exploit_detection/p_exploit_detection.c

+++ b/src/modules/exploit_detection/p_exploit_detection.c

@@ -1245,6 +1245,7 @@ static int p_cmp_creds(struct p_cred *p_orig, const struct cred *p_current_cred,

 #define P_CMP_PTR(orig, curr, name) \

    if (orig != curr) { \

+      printk(KERN_CRIT "p_ret[%d] test_task_syscall_work=%d",p_ret,test_task_syscall_work(p_current, SECCOMP)); \

       if (p_opt) { \

          if (P_CTRL(p_log_level) >= P_LOG_WATCH) \

             p_print_log(P_LOG_ALERT, \

Sure

Kirkezz commented 4 hours ago

@Adam-pi3

compile it and load instead of the one which you have pre-installed from DKMS and check if you see the same problem?

I installed lkrg-dkms-git from AUR (replacing lkrg-dkms with it). The problem still persists in the logs (got one entry this boot: "BLOCK: Task: Killing pid 1424, name HTML5 Parser"), but the previous boot has a reappeared problem I'd almost forgotten about when I occasionally boot up, and there's a “Temporary failure in name resolution” and

dhcpcd[706]: no valid interfaces found
dhcpcd[706]: no interfaces have a carrier

I don't know if this is related to LKRG or that I recently updated all packages in my system to not have a partial upgrade.

Laitinlok commented 35 minutes ago

[ 119.987323] [ T5887] p_ret[0] test_task_syscall_work=1 [ 119.987328] [ T5887] LKRG: ALERT: DETECT: Task: seccomp filter pointer corruption for pid 5887, name services.exe [ 119.987335] [ T5887] LKRG: ALERT: BLOCK: Task: Killing pid 5887, name services.exe [ 123.937601] [ T6134] p_ret[0] test_task_syscall_work=1 [ 123.937607] [ T6134] LKRG: ALERT: DETECT: Task: seccomp filter pointer corruption for pid 6134, name services.exe [ 123.937614] [ T6134] LKRG: ALERT: BLOCK: Task: Killing pid 6134, name services.exe [ 143.476774] [ T6433] p_ret[0] test_task_syscall_work=1 [ 143.476778] [ T6433] LKRG: ALERT: DETECT: Task: seccomp filter pointer corruption for pid 6433, name services.exe [ 143.476784] [ T6433] LKRG: ALERT: BLOCK: Task: Killing pid 6433, name services.exe [ 148.636004] [ T6683] p_ret[0] test_task_syscall_work=1 [ 148.636009] [ T6683] LKRG: ALERT: DETECT: Task: seccomp filter pointer corruption for pid 6683, name services.exe [ 148.636015] [ T6683] LKRG: ALERT: BLOCK: Task: Killing pid 6683, name services.exe [ 344.579928] [ T5401] p_ret[0] test_task_syscall_work=1 [ 344.579933] [ T5401] LKRG: ALERT: DETECT: Task: seccomp filter pointer corruption for pid 5401, name TaskCon~ller #6 [ 344.579939] [ T5401] LKRG: ALERT: BLOCK: Task: Killing pid 5401, name TaskCon~ller #6 [ 466.900735] [ T7484] p_ret[0] test_task_syscall_work=1 [ 466.900741] [ T7484] LKRG: ALERT: DETECT: Task: seccomp filter pointer corruption for pid 7484, name services.exe [ 466.900747] [ T7484] LKRG: ALERT: BLOCK: Task: Killing pid 7484, name services.exe