solo-io / bumblebee

Get eBPF programs running from the cloud to the kernel in 1 line of bash
Apache License 2.0
1.26k stars 78 forks source link

bpf_map_delete_elem is not reflecting its action to userspace (and prometheus) #92

Closed andrea-tomassi closed 1 year ago

andrea-tomassi commented 2 years ago

Version

0.0.13

Linux Version

5.15.0-46-generic

Describe the bug

The environment:

Working with "tracepoint/syscalls/sys_enter_openat"

and a very simple MAP


struct {
    __uint(type, BPF_MAP_TYPE_HASH);
    __uint(max_entries, MAX_CONTAINERS_NUM);
    __type(key, struct my_struct_t);
    __type(value, u64); 
} k8s_containers SEC(".maps.counter");

struct my_struct_t{
    u64 cgroup;
    char my_char[MAX_LEN];
}__attribute__((packed));

If I put a key/value pair into a BPF_MAP_TYPE_HASH with the .map.counter flag, then I can see the corresponding value both from bee cli and Prometheus endpoint.

For a very simple test just add a new key using bpf_map_update_elem with BPF_NOEXIST flag. (Using BPF_NOEXIST you will get an error if you try to create a key that already exists. To create an item with the same key you need to delete that key. This is usefull if you want to be sure that key has been deleted before).

Using __sync_fetch_and_add I can easily increment the counter for that key as well.

The Issue:

Unfortunately it seems I cannot get rid of that key.

In fact if I use the

bpf_map_delete_elem primitive the key is still visible into the userspace stack (both bee CLI and Prometheus endpoint).

I'm sure I correctly deleted the key from kernel space program, because if I try to query for that key using bpf_map_lookup_elem I got a null result. I can add the same key one more time after deletion (still using BPF_NOEXIST). This is a proof of the fat that on the kernel space creation and deletion are working just fine. It seems the user space program is not reflecting that deletions.

Steps to reproduce the bug

  1. create a key/value into a BPF_MAP_TYPE_HASH
  2. delete the key
  3. look at the Prometheus endpoint

Expected Behavior

The corresponding key/value shoud disappear

Additional Context

No response

lgadban commented 1 year ago

Hi @andrea-tomassi, thank you for raising this! This is definitely a bug and we will fix.

If you are interested in opening a PR we would be happy to review and accept!

andrea-tomassi commented 1 year ago

@lgadban unfortunatelly I'm not a GO programmer, and I guess this is what we are going to use here for user space. However, our ebpf program does a very intensive usage of key creation/deletion and getting that deletion reflected in userspace would be great. So, as soon as a fix is available for that I can help in testing it in a pretty accurate way. Let me know if I can give you some more information on that.

andrea-tomassi commented 1 year ago

Hi @lgadban, After a wide testing on the Bumblebee stack I can say it works like a charm. We are currently feeding an AI based threat detection system and observability for Cloud Native Workloads on top of that technology.

Unfortunaltely, this open issue is limiting the data we bring to our alogorithms. I would need to know at least if this issue will ever be addressed in the near future, because otherwise we will have to look for different solutions unfortunately.

As I already said we (still) lack of golang developers. However, we can help in any different way than golang programming. (producing an ebpf test case for kernel space, testing any fix you can produce etc).

Thanks in advance for your help.

lgadban commented 1 year ago

Thanks for the update @andrea-tomassi ! Let me talk with the team and we will get back to you soon

EItanya commented 1 year ago

Quick update: meeting on our end today to discuss the fix. Will update more this afternoon!

EItanya commented 1 year ago

Being worked on here: https://github.com/solo-io/bumblebee/pull/99

krisztianfekete commented 1 year ago

Hey @andrea-tomassi, have you had a chance to test the WIP solution on the branch?