kolide / launcher

Osquery launcher, autoupdater, and packager
https://kolide.com/launcher
Other
506 stars 103 forks source link

osqueryd repeatedly faulting on Linux (EFAULT) #1773

Closed ankon closed 3 months ago

ankon commented 4 months ago

From dmesg:

[Tue Jul  9 12:40:09 2024] show_signal: 68 callbacks suppressed
[Tue Jul  9 12:40:09 2024] traps: SchedulerRunner[2848] general protection fault ip:55f3cffa1892 sp:7f4734bf9fd0 error:0 in osqueryd[55f3cfa02000+2eb2000]
[Tue Jul  9 12:43:14 2024] SchedulerRunner[4480]: segfault at 18 ip 000055d8f49b0892 sp 00007fb3c0bfb250 error 4 in osqueryd[55d8f4411000+2eb2000] likely on CPU 2 (core 2, socket 0)
[Tue Jul  9 12:43:14 2024] Code: df 48 89 c6 e8 1d 93 90 02 48 8d 35 46 30 5d fe 49 89 e7 4c 89 ff 48 89 da e8 8b bc 90 02 49 8b 06 6a 01 5e 4c 89 f7 4c 89 fa <ff> 50 18 48 89 e7 e8 0d 8b 90 02 64 48 8b 04 25 28 00 00 00 48 3b
[Tue Jul  9 13:09:57 2024] traps: SchedulerRunner[6507] general protection fault ip:5577165b1892 sp:7fa485ffa1b0 error:0 in osqueryd[557716012000+2eb2000]
[Tue Jul  9 13:40:41 2024] SchedulerRunner[7375]: segfault at 0 ip 0000000000000000 sp 00007f80fdffa818 error 14 in osqueryd[55b939ad2000+1f44000] likely on CPU 1 (core 1, socket 0)
[Tue Jul  9 13:40:41 2024] Code: Unable to access opcode bytes at 0xffffffffffffffd6.
[Tue Jul  9 14:08:42 2024] SchedulerRunner[7985]: segfault at 1 ip 00007f86db38fd1c sp 00007f86d0bf9dd8 error 4 in libc.so.6[7f86db238000+16d000] likely on CPU 5 (core 5, socket 0)
[Tue Jul  9 14:08:42 2024] Code: ff ff 48 89 f8 0f bc c9 f3 a4 c3 0f 1f 00 f3 0f 1e fa 89 f8 62 a1 fd 00 ef c0 25 ff 0f 00 00 3d e0 0f 00 00 0f 87 24 01 00 00 <62> f1 7d 20 74 07 c5 fb 93 c0 85 c0 74 16 0f bc c0 c3 66 90 0f bc

launcher system journal:

Jul 09 12:36:51 minerva osqueryd[2588]: osqueryd started [version=5.12.2]
Jul 09 12:40:09 minerva osqueryd[4225]: osqueryd started [version=5.12.2]
Jul 09 12:43:14 minerva osqueryd[6247]: osqueryd started [version=5.12.2]
Jul 09 13:09:58 minerva osqueryd[7116]: osqueryd started [version=5.12.2]
Jul 09 13:40:42 minerva osqueryd[7729]: osqueryd started [version=5.12.2]
Jul 09 14:08:42 minerva osqueryd[8373]: osqueryd started [version=5.12.2]

System is up-to-date, restarting did not fix these.

Linux minerva 6.9.7-200.fc40.x86_64 #1 SMP PREEMPT_DYNAMIC Thu Jun 27 18:11:45 UTC 2024 x86_64 GNU/Linux

I don't see any symbols in osqueryd, so unfortunately don't see a quick way of pointing anywhere closer.

ankon commented 4 months ago

Forgot the version:

$ rpm -q launcher-kolide-k2
launcher-kolide-k2-1.4.2-1.x86_64
directionless commented 4 months ago

If you're running launcher, could you please run sudo /usr/local/kolide-k2/bin/launcher flare and send us the output? It will create a debugging tarball and upload it to our cloud, and print out the remote file name

ankon commented 4 months ago
{"caller":"main.go:36","msg":"Launcher starting up","revision":"db7106fe13683c7b1c656de23672cbe5f50e7b59","severity":"info","ts":"2024-07-10T09:41:57.520941275Z","version":"1.4.2"}
{"caller":"library_lookup.go:195","component":"tuf_library_lookup","msg":"found executable matching current release","path":"/var/kolide-k2/k2device.kolide.com/updates/launcher/1.8.1/launcher","severity":"info","ts":"2024-07-10T09:41:57.541273749Z","version":"1.8.1"}
{"caller":"main.go:242","msg":"got new version of launcher to run","new_binary_path":"/var/kolide-k2/k2device.kolide.com/updates/launcher/1.8.1/launcher","new_binary_version":"1.8.1","old_version":"1.4.2","severity":"info","ts":"2024-07-10T09:41:57.541327692Z"}
{"caller":"main.go:219","msg":"preparing to exec new binary","new_binary":"/var/kolide-k2/k2device.kolide.com/updates/launcher/1.8.1/launcher","old_version":"1.4.2","severity":"info","ts":"2024-07-10T09:41:57.541343773Z"}
{"time":"2024-07-10T09:41:57.551326109Z","level":"INFO","msg":"launcher starting up","launcher_run_id":"01J2E0AS5FWV2W2NB5DPJD0T8V","version":"1.8.1","revision":"22bf14babfa22ff5ddf7e744f42a825dd638ff7c"}
{"time":"2024-07-10T09:41:57.560509096Z","level":"INFO","msg":"found executable matching current release or pinned version","launcher_run_id":"01J2E0AS5FWV2W2NB5DPJD0T8V","component":"tuf_library_lookup","binary":"launcher","update_channel":"stable","pinned_version":"","executable_path":"/var/kolide-k2/k2device.kolide.com/updates/launcher/1.8.1/launcher","executable_version":"1.8.1","span_id":"0000000000000000","trace_id":"00000000000000000000000000000000","trace_sampled":false}
{"time":"2024-07-10T09:41:57.560528272Z","level":"INFO","msg":"nothing newer","launcher_run_id":"01J2E0AS5FWV2W2NB5DPJD0T8V"}
{"time":"2024-07-10T09:42:01.077718273Z","level":"INFO","msg":"flare creation complete","launcher_run_id":"01J2E0AS5FWV2W2NB5DPJD0T8V","status":"flare uploaded successfully","file":"2024/07/10/01J2E0ASGYW4G7M8Q3Z5AHW6T8.zip"}
time=2024-07-10T09:42:01.077Z level=INFO source=/home/runner/work/launcher/launcher/cmd/launcher/flare.go:103 msg="flare creation complete" launcher_run_id=01J2E0AS5FWV2W2NB5DPJD0T8V status="flare uploaded successfully" file=2024/07/10/01J2E0ASGYW4G7M8Q3Z5AHW6T8.zip
directionless commented 4 months ago

Thank you so much for sending that in. We'll dig in

directionless commented 4 months ago

(Our internal discussion https://kolide.slack.com/archives/CGFJY1SP2/p1720636331882499 and some cores in https://kolide.slack.com/archives/CGFJY1SP2/p1720711299172329)

RebeccaMahany commented 4 months ago

Also followed up in osquery slack: https://osquery.slack.com/archives/C08V7KTJB/p1720792822595459

ankon commented 4 months ago

As far as I can see I cannot access these, so: Let me know if I can be of any help.

Current rough counters:

$ sudo dmesg -T | grep osqueryd  | sed -re 's,  , ,' | cut -f 1,2,3 -d ' '  | sort | uniq -c
     27 [Fri Jul 12
     17 [Sat Jul 13
     29 [Thu Jul 11
     17 [Tue Jul 9
     28 [Wed Jul 10
RebeccaMahany commented 4 months ago

Sorry, closed by accident

RebeccaMahany commented 3 months ago

An issue was opened in osquery here: https://github.com/osquery/osquery/issues/8384

ankon commented 3 months ago

I see things moving in the right direction, but I am having a bit of trouble understanding how many steps are there between "osquery has a (resolved/fixed) issue" and "osquery getting updated for kolide".

Is there a way for me to do the update manually, for instance?

RebeccaMahany commented 3 months ago

@ankon It looks like that fix has not made it into an osquery release yet -- I don't see it in 5.13.0, anyway -- so I don't think there's anything you can do manually at the moment. Will tag in @directionless for a better explanation of the osquery release process than I can give. 🙂

directionless commented 3 months ago

In general, Kolide uses the official osquery releases. This means our process is something like:

  1. Work to fixing osquery
  2. Osquery releases (roughly every 2 months)
  3. Kolide deploys osquery as a beta (roughly 1-2 weeks)
  4. Kolide deploys to stable

In this case, I know that osquery is talking about cutting a 5.13.2 release with this fix. Though I'm delaying that slightly, because there is another linux crash. I would estimate to see an osquery release deployed out in the next 2 weeks.

RebeccaMahany commented 3 months ago

@ankon -- we just released osquery 5.13.1 to stable for Kolide. You should hopefully see this autoupdate about an hour, and it should resolve the segfault issue. Let us know how it works for you!

ankon commented 2 months ago

I can confirm that indeed no more segfaults in my dmesg. Thanks a lot!