microsoft / retina

eBPF distributed networking observability tool for Kubernetes
https://retina.sh
MIT License
2.71k stars 205 forks source link

should advanced local-context metrics have source and destination prefixes for context labels? #344

Open huntergregory opened 5 months ago

huntergregory commented 5 months ago

The documentation says that advanced metrics with local-context (see Metric Modes) should have these context labels. For instance, it says there is a source_podname label for outgoing traffic and destination_podname for incoming traffic (remote-context mode always has both these labels).

However, currently the local-context mode just has a podname label. It has metrics such as:

networkobservability_adv_forward_count{direction="egress",ip="10.224.1.181",namespace="kube-system",podname="coredns-767bfbd4fb-6ns9c",workloadKind="ReplicaSet",workloadName="coredns-767bfbd4fb"} 84956
networkobservability_adv_forward_count{direction="ingress",ip="10.224.1.107",namespace="kube-system",podname="metrics-server-76848-qg9wn",workloadKind="ReplicaSet",workloadName="metrics-server-76848"} 27
...
networkobservability_adv_tcpflags_count{flag="RST",ip="10.224.1.53",namespace="kube-system",podname="ama-metrics-node-4dsls",workloadKind="DaemonSet",workloadName="ama-metrics-node"} 8
networkobservability_adv_tcpflags_count{flag="SYN",ip="10.224.1.101",namespace="kube-system",podname="konnectivity-agent-5c6879c84b-9665s",workloadKind="ReplicaSet",workloadName="konnectivity-agent-5c6879c84b"} 96
...
networkobservability_adv_dns_response_count{ip="10.224.1.181",namespace="kube-system",num_response="9",podname="coredns-767bfbd4fb-6ns9c",query="dc.services.visualstudio.com.",query_type="A",response="",return_code="NOERROR",workloadKind="ReplicaSet",workloadName="coredns-767bfbd4fb"} 14664
networkobservability_adv_dns_response_count{ip="10.224.1.203",namespace="kube-system",num_response="0",podname="ama-metrics-node-tssl4",query="dc.services.visualstudio.com.cluster.local.",query_type="A",response="",return_code="NXDOMAIN",workloadKind="DaemonSet",workloadName="ama-metrics-node"} 1212

For adv_forward_count, you could infer if the podname is a source/destination by the direction. But for other metrics like adv_tcpflags_count, either the client or the server could initiate a TCP reset (RST flag), in which case knowing the source or destination could be an important detail.

cmergenthaler commented 3 months ago

Would really appreciate that change in order to get better insights on whether tcpflags are being received or sent by a pod when using local-context. Was a bit confused by the discrepancy between documentation and actual labels with local context.