netobserv / netobserv-ebpf-agent

Network Observability eBPF Agent
Apache License 2.0
127 stars 32 forks source link

can't instantiate NetObserv eBPF Agent on kernel 5.4 #369

Open matijavizintin opened 2 months ago

matijavizintin commented 2 months ago

I'm trying to run it on kernel 5.4 (Ubuntu 20, 5.4.0-189-generic) and it fails with FATA[0000] can't instantiate NetObserv eBPF Agent error="loading and assigning BPF objects: field EgressFlowParse: program egress_flow_parse: map direct_flows: map create without BTF: invalid argument"

Is there any way to make it work because upgrading all the servers is not realistic atm.

matijavizintin commented 2 months ago

Well, it turns out is not that hard. I had to replace the RingBuffer with PerfEvent Array in the ebpf code, remove the fentry which is not supported yet and do few changes to use the perf reader. If anyone is interested I can share more details.

jotak commented 2 months ago

Thanks @matijavizintin for reporting the issue. We indeed test & validate the agent on more recent kernel versions, sorry for the quirks you had with an older one.

cc @msherif1234 I'm actually wondering why we claim supporting kernels 4.18+ (this is in the readme) as the ringbuffer was only added in 5.8, and iirc the very first version of the agent was already using the ringbuffer. Maybe some historical knowledge that we lose... (or was there some redhat specific backports?)

We do some specific reassignments for kernels older than 5.14 but that's not related to the ring buffer.

@matijavizintin , would your changes be straightforward to integrate upstream?

On the fEntry, a quick look suggests that we fall back to using kprobe when they fail, is there something missing here?

matijavizintin commented 2 months ago

It was a fun one and I learned something new :)

I would say yes. I did it in a hackish way because I needed to prove that it works but in general I added a perf reader to FlowFetcher to read from BPF_MAP_TYPE_PERF_EVENT_ARRAY instead of BPF_MAP_TYPE_RINGBUF.

Regarding fEntry that's true however the code will already fail when loading TCPRcvFentry ebpf program. So I commented that out and used kprobe.

I'm attaching the patch so you can see what I did. As I said, very hackish. support_for_older_kernels.patch

msherif1234 commented 2 months ago

we already fall back to kprobe if fentry isn't supported or available in fact I remember fentry not available for s390 arch even with recent kernel see https://github.com/netobserv/netobserv-ebpf-agent/pull/265

readme probably need some updates I think we need to set mini kernel version to 5.8

we can't switch our ringbug map with perf events in production as it less efficient for our application https://nakryiko.com/posts/bpf-ringbuf/

Thanks @matijavizintin

jotak commented 2 months ago

@msherif1234 if there is a demand (upstream) to keep support for 5.4 / 5.8, etc. , we could create one or more dedicated branches. Which also allows us to clean up the main branch and get rid of the compatibility code, wdyt?

matijavizintin commented 2 months ago

@msherif1234 That's true, however the code will already throw an error here https://github.com/netobserv/netobserv-ebpf-agent/blob/main/pkg/ebpf/tracer.go#L738 even before falling back https://github.com/netobserv/netobserv-ebpf-agent/blob/main/pkg/ebpf/tracer.go#L172 At least that's the case for x86.

Yeah, saw the performance impact in the docs, I plan to test it in prod soon since it's not feasible to upgrade all the servers. My plan is to install your latest release to Ubuntu22+ servers (kernel > 5.8) and the patched code using perf event array on the older OSes. I can report the difference in performance. I also have a plan to make a code a bit nicer than the current hackish patch :)

msherif1234 commented 2 months ago

@matijavizintin its a warning not an error https://github.com/netobserv/netobserv-ebpf-agent/blob/main/pkg/ebpf/tracer.go#L175 but for advanced kernel though and we do use this logic in production, for older kernel check this is a bug IMO and https://github.com/netobserv/netobserv-ebpf-agent/pull/374 should help will you able to see if that helps with your kernel ? we didn't see it because the older kernel we tested with seems to have fentry support

msherif1234 commented 2 months ago

@msherif1234 if there is a demand (upstream) to keep support for 5.4 / 5.8, etc. , we could create one or more dedicated branches. Which also allows us to clean up the main branch and get rid of the compatibility code, wdyt?

@jotak if want to go to older kernel w/o rbuf support we might have to do a fair bit of work to replace with perf event map type which will require good amount of work in ebpf and the userspace ? but its possible if there is pressing need

matijavizintin commented 2 months ago

@msherif1234 sorry for the late reply. I applied your changes from https://github.com/netobserv/netobserv-ebpf-agent/pull/374 to my branch and it works well, thanks!

msherif1234 commented 1 month ago

Thank you @matijavizintin !!