microsoft / retina

eBPF distributed networking observability tool for Kubernetes
https://retina.sh
MIT License
2.7k stars 202 forks source link

Support capturing/tracing unix domain sockets with kprobe ebpf #225

Open vakalapa opened 6 months ago

vakalapa commented 6 months ago

Today Retina only watches for events from either tc prog or some drop reason kprobes, Retina should be watching for events of unix domain socket as well. This will need additional work to understand how to distinguish src and dest pod/container/process.

For starters, attaching to below kprobes: kprobe/unix_stream_sendmsg kprobe/unix_dgram_sendmsg fentry/unix_stream_sendmsg fentry/unix_dgram_sendmsg

Example: https://github.com/Asphaltt/sockdump

FZhg commented 5 months ago

Hi, I would like to work on this Issue. But I might need some time to figure out how to attach to the kprobe events. Could this be ok?

rbtr commented 5 months ago

That's okay, keep us updated 🙂 Assigned it to you, thanks for taking a look

FZhg commented 5 months ago

Hi, I have some initial thoughts about this feature. Essentially, this feature is to implement a UNIX domain socket plugin.

Metrics

Metric Name Description Extra Labels
unix_msg_count message counts for UNIX domain Socket direction, socket_type
unix_msg_bytes bytes count for UNIX domain Socket direction, socket_type

Label Values

Possible values for direction:

Possible values for socket_type:

Advanced Metrics

I need to find ways to enrich the flow without the IP addresses.

My plan is to implement the send_msg probes for the stream and datagram scoket first and then add more probes about receive_msg and seqpacket scoket later.

Dose this look like a good path? Thanks~

vakalapa commented 5 months ago

@FZhg thanks for the update, I like this general direction for basic metrics. Are there any other labels we can add to the basic metric of unix_msg_count or unix_msg_bytes ? for ex: is there a possibility to find the path/port of the unix domain communication as a label? this would help with some granularity on the data transfer until advanced with more details can be developed.

For advanced metrics, we will have to find a way to discover which cgroup or process is owning that socket send/recv and probably will have to enrich based on mapping of cgroup/process to pod.