microsoft / retina

eBPF distributed networking observability tool for Kubernetes
https://retina.sh
MIT License
2.52k stars 172 forks source link

Retrieve packet drop reasons with kfree_skb_reason() #515

Open rectified95 opened 1 week ago

rectified95 commented 1 week ago

Description

This draft PR captures most of the work needed for Retina to plug into all of Linux kernel's packet drop reasons by using a kprobe at function kfree_skb_reason().

We are now shelving this work since the number of drop reasons available in kernel 5.15 LTS used by Azure Linux and AKS, is not much greater than what Retina has and so not very useful. Azure Linux 3.0 will run on kernel 6.6 which contains a much expanded version of enum skb_drop_reason, and also allows us to bypass the problems resulting from it having been prepended to after it was initially introduced.

Changes:

Validation done:

Below, we can see one dropped ping packet being recorded in the drop count metric - it changes from 19 to 20. Its kernel drop reason value is _NETFILTER_DROP = 6, which currently maps to Retina's UNKNOWN_DROP. We can also see Retina's IPTABLE_RULE_DROP go up in tandem, since the image deployed also contained the existing nf_hook_slow kprobe.

image

Remaning work:

Related Issue

Fixes #367