threathunters-io / laurel

Transform Linux Audit logs for SIEM usage
GNU General Public License v3.0
707 stars 56 forks source link

Laurel does not aggregate all EXECVE events #178

Open SolitudePy opened 10 months ago

SolitudePy commented 10 months ago

Hi, while doing our work we noticed probably a minor bug in Laurel that on some events it generates a json without the EXECVE/PROCTITLE key. We checked /var/log/audit and filtered based on msg, and we saw multiple events for EXECVE(SYSCALL,EXECVE,PROCTITLE,CWD etc) then we checked the matched laurel json event(/var/log/laurel based on ID) and it only had a SYSCALL key, missing the EXECVE key. We checked and it happens on multiple servers, without any correlation to event sizing/high buffering. Our current Auditd configuration is not verbose for the other syscall types so we only encountered that for EXECVE.

I can't have a sample of the events that have this bug. Would like if you could help in some way, and I will help as much as I can, Thanks!

hillu commented 10 months ago

Without sample data, there is not much I can do.

What version of Laurel are you using? Did laurel log anything unusual to syslog?

hillu commented 10 months ago

@SolitudePy Can you provide data or instructions on how to reproduce the issue?

hillu commented 10 months ago

@SolitudePy Incidentally, I stumbled upon a bug today that affected EXECVE events for very long command lines (> 2^16 arguments). (This has been fixed in d89c80cbcd12d88278ba1f99e4ad87358d55422e.) Does this look llike the symptom you observed?

SolitudePy commented 10 months ago

@hillu Hello, we are using Laurel v0.5.3, I did not see anything peculiar that laurel logged. The command line wasnt that long for sure. also, from what I experienced the EXECVE field was totally dropped from the laurel log even though the SYSCALL.syscall is indeed EXECVE. I am not sure how you can produce that yourself, but you could try ingesting a lot of logs to a solution like Splunk and then search where SYSCALL.syscall equals to execve but and EXECVE is null for example

hillu commented 9 months ago

@SolitudePy Does Laurel or auditd log anything strange or meaningful around the time where you are missing data in the log?

SolitudePy commented 9 months ago

Yes, I forgot to mention but we checked on multiple servers and it seems the correlated event was from auditd: dispatch err (pipe full) event lost

hillu commented 9 months ago

dispatch err (pipe full) event lost

This basically means that auditd (or audispd if you are using auditd < 3.0) is trying to write lines faster than Laurel consumes them.

The file descriptor that gets passed to Laurel as STDIN is actually one end of an AF_LOCAL socket so there's an associated buffer whose size can be increased (SO_SNDBUF). IIRC, there's no setting in auditd, though.

Reducing the number of events generated using a tweaked audit ruleset should help.

SolitudePy commented 9 months ago

@hillu yes I thought so. its quite surprising flood of events cause the dispatcher to miss full lines of EXECVE and therefore have laurel miss it. also, as I stated before our ruleset is quite basic and we planned to make it more verbose, it will be sad if laurel could not handle it, since the original audit.log does log all of the events :\

hillu commented 9 months ago

I'm sorry; as far as I know there isn't anything laurel can do here until we put reading rom input into a separate thread.

If we do the equivalent of a

setsockopt(fd, SO_SNDBUF, newsize, sizeof(newsize))

on Laurel's stdin, this should change the size of the wrong buffer. According to unix(7)

The SO_SNDBUF socket option does have an effect for UNIX domain sockets, but the SO_RCVBUF option does not.

Do you think you might be able to run a patched version of auditd?

SolitudePy commented 9 months ago

No, I'm sorry, are you saying there cant be a fix in laurel? also if that speculation is correct I should see more events per second in that gap rather than servers that do not have this bug, right?

hillu commented 9 months ago

are you saying there cant be a fix in laurel?

Not quite. The communication between auditd and laurel is buffered – and the cause of lines getting lost is most likely intermittent bursts of lines and overflowing that buffer before Laurel can catch up. The natural solution would be increasing the size of that buffer, but that can only be done on the sending side, i.e. not by Laurel.

Another solution would be to switch input handling on Laurel's side to a separate thread. I am open to pursuing this path, but this won't be done by the end of the week and I'd need to rely on you to test stuff for me.

We don't observe this problem frequently enough that we consider it an enormous problem.

Can you give me ballpark numbers about the number of events (unique message IDs) per second? What kind of hardware are you running on?

hillu commented 9 months ago

No, I'm sorry, are you saying there cant be a fix in laurel? also if that speculation is correct I should see more events per second in that gap rather than servers that do not have this bug, right?

Yes, pretty much. Another explanation would be that something is slowing down Laurel in processing or writing its log files considerably.

SolitudePy commented 9 months ago

@hillu we are also seeing selinux msgs about laurel trying to get rpm info for files for many random files for example, it doesnt seem to affect laurel though... I will be able to give you the exact numbers next week

hillu commented 9 months ago

we are also seeing selinux msgs about laurel trying to get rpm info for files for many random files

Those are AVC messages, right? It would be really helpful if you could post some of those.

SolitudePy commented 9 months ago

yes they appear in avc and also selinux troubleshoot, I will post them next week

SolitudePy commented 9 months ago

Hello @hillu we checked an option to change q_depth of audispd (rhel 7) and it might fix the error of pipe full, but we afterwards still encountered logs that laurel has with syscall.syscall = execve and execve record does not exist. About SELinux: It has a lot of errors we logged on permissive, some of them were:

denied write denied unlink denied sys_ptrace for /proc//environ,stat denied getattr for many many files such as: /usr/bin/rpm, /etc/passwd, /usr/bin/dash and many more In general, it seems laurel is working only if its selinux type is permissive. Thanks for your help

hillu commented 9 months ago

In general, it seems laurel is working only if its selinux type is permissive.

oh… are you not using the SELinux policy from contrib/selinux?

Regarding q_depth and other settings … I think that I found a way to add an I/O threat that may fix the problem, but I'd need somebody to test that before releasing it. Could you do that?

SolitudePy commented 9 months ago

Iil come back to you with an answer, regarding q_depth doesnt it fix the buffer size you mentioned before?

SolitudePy commented 9 months ago

I am using the selinux policy in the git, the permissive type is included there with a comment of removing it only if there are no avcs

hillu commented 9 months ago

regarding q_depth doesnt it fix the buffer size you mentioned before?

Apparently, q_depth means that messages are buffered in user-space. Yes, this should help!