osquery / osquery

SQL powered operating system instrumentation, monitoring, and analytics.
https://osquery.io
Other
21.74k stars 2.44k forks source link

Osquery crashes on linux when receives kill syscall with disabled audit_allow_kill_process_events #7353

Open mogrein opened 2 years ago

mogrein commented 2 years ago

Bug report

What operating system and version are you using?

version = 16.04.6 LTS (Xenial Xerus) build = platform = ubuntu

What version of osquery are you using?

version = 4.8.0.0-yandex But the crash is reproducable up to 5.0.1

What steps did you take to reproduce the issue?

Running osquery with --audit_allow_process_events=1 and --audit_allow_kill_process_events=0 and parsing kill syscall in audit log.

What did you expect to see?

Osquery works

What did you see instead?

It crashes with the following stacktrace SIGSEGV

std::__1::__tree_const_iterator<std::__1::__value_type<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > >, std::__1::__tree_node<std::__1::__value_type<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > >, void*>*, long> std::__1::__tree<std::__1::__value_type<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > >, std::__1::__map_value_compare<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::__value_type<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > >, std::__1::less<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > >, true>, std::__1::allocator<std::__1::__value_type<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > > >::find<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > >(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&) const () 
osquery::GetStringFieldFromMap(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >&, std::__1::map<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::less<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > >, std::__1::allocator<std::__1::pair<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > > > const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&) () 
osquery::CopyFieldFromMap(std::__1::map<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::less<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > >, std::__1::allocator<std::__1::pair<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > > >&, std::__1::map<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::less<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > >, std::__1::allocator<std::__1::pair<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > > > const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&) () 
osquery::AuditProcessEventSubscriber::ProcessEvents(std::__1::vector<std::__1::map<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::less<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > >, std::__1::allocator<std::__1::pair<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > > >, std::__1::allocator<std::__1::map<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::less<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > >, std::__1::allocator<std::__1::pair<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > > > > >&, std::__1::vector<osquery::AuditEvent, std::__1::allocator<osquery::AuditEvent> > const&) () 
osquery::AuditProcessEventSubscriber::Callback(std::__1::shared_ptr<osquery::AuditEventContext> const&, std::__1::shared_ptr<osquery::AuditSubscriptionContext> const&) () 

Further inverstigation of crashdumps and osquery with --audit_debug=1 showed us that for some reason osquery gets syscall=62 event from audit without the following AUDIT_OBJ_PID record.

Here is snippet with --audit_allow_kill_process_events=1 and --audit_allow_kill_process_events=1

I1019 18:54:14.129583 22484 auditdnetlink.cpp:769] 1300, audit(1634658845.864:100500): arch=c000003e syscall=62 success=yes exit=0 a0=587a a1=9 a2=0 a3=0 items=0 ppid=7974 pid=7975 auid=345447 uid=345447 gid=110415 euid=345447 suid=345447 fsuid=345447 egid=110415 sgid=110415 fsgid=110415 tty=pts0 ses=25 comm="bash" exe="/bin/bash" subj=staff_u:staff_r:staff_t:s0 key=(null)
I1019 18:54:14.129631 22484 auditdnetlink.cpp:769] 1318, audit(1634658845.864:100500): opid=22650 oauid=345447 ouid=345447 oses=25 obj=staff_u:staff_r:staff_t:s0 ocomm="sleep"
I1019 18:54:14.129654 22484 auditdnetlink.cpp:769] 1327, audit(1634658845.864:100500): proctitle="-bash"
I1019 18:54:14.129671 22484 auditdnetlink.cpp:769] 1320, audit(1634658845.864:100500):
...
I1019 18:54:03.951023 22484 auditdnetlink.cpp:769] 1300, audit(1634658831.767:99686): arch=c000003e syscall=62 success=yes exit=0 a0=6ba a1=0 a2=0 a3=22d items=0 ppid=22503 pid=22504 auid=0 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=496 comm="sh" exe="/bin/bash" subj=system_u:system_r:system_cronjob_t:s0-s0:c0.c1023 key=(null)
I1019 18:54:03.951071 22484 auditdnetlink.cpp:769] 1318, audit(1634658831.767:99686): opid=1722 oauid=-1 ouid=600 oses=-1 obj=system_u:system_r:initrc_t:s0 ocomm="python"
I1019 18:54:03.951097 22484 auditdnetlink.cpp:769] 1327, audit(1634658831.767:99686): proctitle=2F62696E2F7368002D63006B696C6C202D302031373232207C7C20696E766F6B652D72632E64207961736D6167656E742072657374617274207C7C202F6574632F696E69742E642F7961736D6167656E742072657374617274
I1019 18:54:03.951118 22484 auditdnetlink.cpp:769] 1320, audit(1634658831.767:99686):

As you can see when audit_allow_kill_process_events enabled, syscall=62 is followed by 1318 record.

But when osquery launched with --audit_allow_process_events=1 and --audit_allow_kill_process_events=0 we get the following log

I1019 18:14:54.267642  6910 auditdnetlink.cpp:769] 1300, audit(1634656491.243:39434): arch=c000003e syscall=62 success=yes exit=0 a0=6ba a1=0 a2=0 a3=22d items=0 ppid=6992 pid=6993 auid=0 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=313 comm="sh" exe="/bin/bash" subj=system_u:system_r:system_cronjob_t:s0-s0:c0.c1023 key=(null)
I1019 18:14:54.267693  6910 auditdnetlink.cpp:769] 1327, audit(1634656491.243:39434): proctitle=2F62696E2F7368002D63006B696C6C202D302031373232207C7C20696E766F6B652D72632E64207961736D6167656E742072657374617274207C7C202F6574632F696E69742E642F7961736D6167656E742072657374617274
I1019 18:14:54.267715  6910 auditdnetlink.cpp:769] 1320, audit(1634656491.243:39434):

As you can see the related 1318 is absent in log. Process events subscriber doesn't handle the situation with AUDIT_OBJ_PID absent and osquery crashes trying to dereference null. https://github.com/osquery/osquery/blob/3795ab0785c067fd09164fab8ddbd3a0d73c256c/osquery/tables/events/linux/process_events.cpp#L232

mogrein commented 2 years ago

BTW I would appreciate any idea why we get kill syscalls in audit. It turnes out that systemd-journald-audit.socket wasn't stopped on machines with this crash, but upon masking it and reboot we still can reproduce an issue on test machine. There seems to be no other audit-subscrribers on the system

Smjert commented 2 years ago

[...] It turnes out that systemd-journald-audit.socket wasn't stopped on machines with this crash, but upon masking it and reboot we still can reproduce an issue on test machine. There seems to be no other audit-subscrribers on the system

Thanks for this report and the relative PR. In the issue you talk about having tested with audit_allow_kill_process_events=1 and audit_allow_kill_process_events=1 or audit_allow_kill_process_events=1 and audit_allow_kill_process_events=1, but they are the same flag. What was the second flag?

BTW I would appreciate any idea why we get kill syscalls in audit.

If you're asking why it has been added to osquery, I would think that it's to see what might be killing critical processes?

mogrein commented 2 years ago

Copy-paste mistake. I wanted to say audit_allow_process_events and audit_allow_kill_process_events

Smjert commented 2 years ago

I see; just as a clarification: I can see the problem when a kill syscall comes and it's missing the next record, but I wanted to reproduce and I was wondering if you were running with --audit_allow_config. If so and you use the --audit_allow_kill_process_events=0, then osquery shouldn't install a kill syscall rule and audit shouldn't generate any event for them.

Is that rule manually installed? What are the rules in sudo auditctl -l?

Something else I noticed (EDIT: that you actually mentioned in your PR!) is that the original PR did not actually expose the new columns needed for the kill sycall. Namely https://github.com/osquery/osquery/blob/3795ab0785c067fd09164fab8ddbd3a0d73c256c/osquery/tables/events/linux/process_events.cpp#L234-L239

Those columns aren't here https://github.com/osquery/osquery/blob/master/specs/posix/process_events.table.