Open camconn opened 1 year ago
Hi @camconn! I would be happy to work on this with you. I'll have some follow up questions likely this weekend when I have a minute to look at this in more depth.
@camconn Okay so mainly what I was wondering is whether there's any identifying information about the packets received after the initial one with the nlmsghdr. If you can point me to any docs you're aware of on the audit subsystem that would be great. I read through the linked article that I'd actually seen before when initially thinking about implementing the audit subsystem, and that seems to highlight a few areas I'll have to add exceptions for in the audit case.
This is also bringing back up the idea I had a while ago of having a quirks.rs
file to segment off all of the "bad" netlink behavior in one spot.
@jbaublitz I did a little research reading Linux's documentation and the sources of kernel/audit*.c
and here's what I found:
struct sk_buff
) to userland.
c. Everything after the Netlink header (first 16 bytes or struct nlmsghdr
) of an audit packet with a type in [1300, 2999] is the audit message.Hope this helps.
@camconn So if I'm understanding you correctly, I think what's probably happening is that the omission of the length of the header is the cause of the panics. That's my best guess knowing how the internals work. Theoretically, all other aspects of what you described should work if you implement a data structure that implements ToBytes
/FromBytes
for the audit format. Does that sound plausible to you?
@jbaublitz
the omission of the length of the header is the cause of the panics.
That's my impression of what's causing the panic. The tail end of the packet is getting interpreted as an additional header and the bogus length causes an OOB index triggering the panic.
Is there anything a user can do to work around this or would this require a patch to Neli?
I think this would require a patch to neli. I'm not able to get around this until after the 25th, but I will work on it as soon as I can after that.
Version of neli v0.7.0-rc2
Does the documentation specify that neil's behavior is correct? No, but it's not Neli's fault. I'm using the Audit Netlink family which is well-known to be inconsistent with other Netlink families. This issue is unique to Audit which sends groups of audit messages with a
struct nlmsghdr
in front followed by strings with no additionalnlmsghdr
s.Does there appear to be a mistake in the netlink documentation based on the kernel code? No. The Linux Audit subsystem is poorly documented and inconsistent with other Netlink famillies. Other libraries/programs have to implement their own parsing (like go-libaudit and auditd) because Audit isn't consistent with its siblings.
Describe the bug Neli incorrectly handles Audit messages and treats them as additional Netlink headers . This causes neli to pull out outrageous message lengths like
808464509
from an ASCII audit message.To Reproduce Create a new cargo project:
Then in Cargo.toml:
And in main.rs:
If you have auditd on your system, disable it with
sudo auditctl -e 0
. You will also need to run this command after you run the reproduction example. The reproduction only triggers a panic with Neli when the audit subsystem goes Off -> On.To see the panic, run:
or alternatively, if you don't want to use sudo directly on the executable:
Here is a backtrace and the output of the program: panic.txt repro.txt
I have seen this behavior on both kernels
5.15.0-83
on Ubuntu and6.1.41
on Gentoo on x86_64.Expected behavior Don't panic when receiving an audit message, and provide an escape hatch to retrieve "leftover" bytes in a neli buffer whenever receiving a message.
Additionally, because the audit message group does not have the same behavior as other netlink families, I would like a way as a user to manually parse the "leftover" messages trailing the initial
nlmsghdr
struct.Additional context The audit family has behavior inconsistent with the rest of Netlink. See commentary above.
Additional incentive When this is fixed I will contribute my code for the Audit family for Neli (see #96) which is currently broken by this bug.