Open matmerr opened 1 month ago
Describe the bug
Retina experienced crash on start with image: mcr.microsoft.com/containernetworking/retina-agent:v0.0.15
mcr.microsoft.com/containernetworking/retina-agent:v0.0.15
To Reproduce Steps to reproduce the behavior:
Defaulted container "retina" out of: retina, init-retina (init) Starting Retina Agent starting Retina daemon with legacy control plane v0.0.15 init client-go api server: **** init logger ts=2024-08-28T16:33:20.448Z level=info caller=metrics/metrics.go:169 msg="Metrics initialized" ts=2024-08-28T16:33:20.448Z level=info caller=legacy/daemon.go:137 msg="{data aggregation level 15 0 low <nil>}" ts=2024-08-28T16:33:20.449Z level=info caller=legacy/daemon.go:173 msg="Remote context is disabled, only pods deployed on the same node as retina-agent will be monitored" ts=2024-08-28T16:33:20.449Z level=info caller=legacy/daemon.go:194 msg="pod selector when remote context is disabled{pod selector 15 0 spec.nodeName=aks-nodepool1-47685273-vmss000001,status.podIP!=10.0.0.4 <nil>}" ts=2024-08-28T16:33:20.478Z level=info caller=legacy/daemon.go:226 msg="Kubernetes server version: v1.27.16" ts=2024-08-28T16:33:20.478Z level=info caller=legacy/daemon.go:248 msg="Initializing Pod controller" ts=2024-08-28T16:33:20.478Z level=info caller=pod/controller.go:89 msg="Setting up Pod controller" ts=2024-08-28T16:33:20.478Z level=info caller=legacy/daemon.go:264 msg="Initializing Node controller" ts=2024-08-28T16:33:20.478Z level=info caller=node/controller.go:89 msg="Setting up Node controller" ts=2024-08-28T16:33:20.479Z level=info caller=legacy/daemon.go:270 msg="Initializing Service controller" ts=2024-08-28T16:33:20.479Z level=info caller=service/controller.go:95 msg="Setting up Service controller" ts=2024-08-28T16:33:20.479Z level=info caller=legacy/daemon.go:277 msg="Initializing MetricsConfig namespaceController" ts=2024-08-28T16:33:20.479Z level=info caller=namespace/namespace_controller.go:105 msg="Setting up Namespace controller" ts=2024-08-28T16:33:20.479Z level=info caller=pluginmanager/pluginmanager.go:67 msg="plugin manager has pod level enabled" ts=2024-08-28T16:33:20.479Z level=info caller=controllermanager/controllermanager.go:79 msg="Initializing controller manager ..." ts=2024-08-28T16:33:20.479Z level=info caller=servermanager/servermanager.go:33 msg="Initializing HTTP server ..." ts=2024-08-28T16:33:20.479Z level=info caller=server/server.go:42 msg="Setting up handlers" ts=2024-08-28T16:33:20.479Z level=info caller=server/server.go:57 msg="Completed handler setup" ts=2024-08-28T16:33:20.479Z level=info caller=servermanager/servermanager.go:37 msg="HTTP server initialized..." ts=2024-08-28T16:33:20.479Z level=info caller=legacy/daemon.go:309 msg="Started controller manager" ts=2024-08-28T16:33:20.479Z level=info caller=servermanager/servermanager.go:42 msg="Starting HTTP server ..." host=0.0.0.0 port=10093 ts=2024-08-28T16:33:20.479Z level=info caller=server/server.go:208 msg="Starting metrics server" ts=2024-08-28T16:33:20.480Z level=info caller=server/server.go:69 msg="starting HTTP server... on " addr=0.0.0.0:10093 ts=2024-08-28T16:33:20.480Z level=info caller=pluginmanager/pluginmanager.go:138 msg="Starting plugin manager ..." ts=2024-08-28T16:33:20.480Z level=info caller=pluginmanager/pluginmanager.go:150 msg="starting watchers" ts=2024-08-28T16:33:20.480Z level=info caller=watchermanager/watchermanager.go:44 msg="watcher started" watcher_type=*endpoint.EndpointWatcher ts=2024-08-28T16:33:20.480Z level=info caller=watchermanager/watchermanager.go:44 msg="watcher started" watcher_type=*apiserver.ApiServerWatcher ts=2024-08-28T16:33:20.481Z level=info caller=manager/server.go:83 msg="starting server" name="health probe" addr=[::]:18081 ts=2024-08-28T16:33:20.481Z level=info caller=controller/controller.go:173 msg="Starting EventSource" controller=pod controllerGroup= controllerKind=Pod source="kind source: *v1.Pod" ts=2024-08-28T16:33:20.481Z level=info caller=controller/controller.go:173 msg="Starting EventSource" controller=service controllerGroup= controllerKind=Service source="kind source: *v1.Service" ts=2024-08-28T16:33:20.481Z level=info caller=controller/controller.go:173 msg="Starting EventSource" controller=node controllerGroup= controllerKind=Node source="kind source: *v1.Node" ts=2024-08-28T16:33:20.481Z level=info caller=controller/controller.go:181 msg="Starting Controller" controller=pod controllerGroup= controllerKind=Pod ts=2024-08-28T16:33:20.481Z level=info caller=controller/controller.go:173 msg="Starting EventSource" controller=namespace controllerGroup= controllerKind=Namespace source="kind source: *v1.Namespace" ts=2024-08-28T16:33:20.482Z level=info caller=controller/controller.go:181 msg="Starting Controller" controller=node controllerGroup= controllerKind=Node ts=2024-08-28T16:33:20.481Z level=info caller=server/server.go:247 msg="Serving metrics server" bindAddress=:18080 secure=false ts=2024-08-28T16:33:20.482Z level=info caller=controller/controller.go:181 msg="Starting Controller" controller=service controllerGroup= controllerKind=Service ts=2024-08-28T16:33:20.482Z level=info caller=controller/controller.go:181 msg="Starting Controller" controller=namespace controllerGroup= controllerKind=Namespace ts=2024-08-28T16:33:20.582Z level=info caller=controller/controller.go:215 msg="Starting workers" controller=node controllerGroup= controllerKind=Node worker count=1 ts=2024-08-28T16:33:20.583Z level=info caller=controller/controller.go:215 msg="Starting workers" controller=service controllerGroup= controllerKind=Service worker count=1 ts=2024-08-28T16:33:20.583Z level=info caller=controller/controller.go:215 msg="Starting workers" controller=pod controllerGroup= controllerKind=Pod worker count=1 ts=2024-08-28T16:33:20.583Z level=info caller=controller/controller.go:215 msg="Starting workers" controller=namespace controllerGroup= controllerKind=Namespace worker count=1 ts=2024-08-28T16:33:20.583Z level=info caller=pod/controller.go:39 msg="Reconciling Pod" Pod=kube-system/konnectivity-agent-644d74dc99-n42g2 ts=2024-08-28T16:33:20.584Z level=info caller=pod/controller.go:39 msg="Reconciling Pod" Pod=kube-system/coredns-autoscaler-7bf9ddd894-xfsp9 ts=2024-08-28T16:33:20.584Z level=info caller=pod/controller.go:39 msg="Reconciling Pod" Pod=kube-system/coredns-d696dc46-flbv7 ts=2024-08-28T16:33:20.585Z level=info caller=pod/controller.go:39 msg="Reconciling Pod" Pod=kube-system/coredns-d696dc46-n8nqk ts=2024-08-28T16:33:21.480Z level=info caller=metrics/metrics_module.go:157 msg="Reconciling metric module" spec= specError="unsupported value type" ts=2024-08-28T16:33:21.480Z level=info caller=metrics/forward.go:41 msg="Creating forward count metrics" options= optionsError="unsupported value type" ts=2024-08-28T16:33:21.480Z level=info caller=metrics/forward.go:41 msg="Creating forward count metrics" options= optionsError="unsupported value type" ts=2024-08-28T16:33:21.480Z level=info caller=metrics/drops.go:38 msg="Creating drop count metrics" options= optionsError="unsupported value type" ts=2024-08-28T16:33:21.480Z level=info caller=metrics/drops.go:38 msg="Creating drop count metrics" options= optionsError="unsupported value type" ts=2024-08-28T16:33:21.480Z level=info caller=metrics/tcpflags.go:37 msg="Creating TCP Flags count metrics" options= optionsError="unsupported value type" ts=2024-08-28T16:33:21.480Z level=info caller=metrics/tcpretrans.go:37 msg="Creating TCP retransmit count metrics" options= optionsError="unsupported value type" ts=2024-08-28T16:33:21.481Z level=info caller=metrics/dns.go:44 msg="Creating DNS count metrics" options= optionsError="unsupported value type" ts=2024-08-28T16:33:21.481Z level=info caller=metrics/dns.go:44 msg="Creating DNS count metrics" options= optionsError="unsupported value type" ts=2024-08-28T16:33:21.481Z level=info caller=metrics/drops.go:72 msg="src labels" labels=reason,direction,ip,namespace,podname,workload_kind,workload_name ts=2024-08-28T16:33:21.481Z level=info caller=metrics/dns.go:76 msg="src labels" labels=query_type,query,ip,namespace,podname,workload_kind,workload_name ts=2024-08-28T16:33:21.481Z level=info caller=metrics/dns.go:91 msg="src labels" labels=return_code,query_type,query,response,num_response,ip,namespace,podname,workload_kind,workload_name ts=2024-08-28T16:33:21.481Z level=info caller=metrics/forward.go:78 msg="src labels" labels=direction,ip,namespace,podname,workload_kind,workload_name ts=2024-08-28T16:33:21.481Z level=info caller=metrics/drops.go:72 msg="src labels" labels=reason,direction,ip,namespace,podname,workload_kind,workload_name ts=2024-08-28T16:33:21.481Z level=info caller=metrics/tcpretrans.go:57 msg="src labels" labels=direction,ip,namespace,podname,workload_kind,workload_name ts=2024-08-28T16:33:21.481Z level=info caller=metrics/forward.go:78 msg="src labels" labels=direction,ip,namespace,podname,workload_kind,workload_name ts=2024-08-28T16:33:21.883Z level=info caller=dropreason/dropreason_linux.go:115 msg="DropReason metric compiled" ts=2024-08-28T16:33:22.793Z level=info caller=common/common_linux.go:74 msg="perf reader created" Map=PerfEventArray(dropreason_events)#14 PageSize=4096 BufferSize=65536 ts=2024-08-28T16:33:23.648Z level=info caller=pluginmanager/pluginmanager.go:122 msg="Reconciled plugin" name=dropreason ts=2024-08-28T16:33:23.648Z level=info caller=pluginmanager/pluginmanager.go:173 msg="starting plugin dropreason" ts=2024-08-28T16:33:23.648Z level=info caller=dropreason/dropreason_linux.go:241 msg="Start listening for drop reason events..." ts=2024-08-28T16:33:23.648Z level=info caller=dropreason/dropreason_linux.go:247 msg="setting up enricher since pod level is enabled" ts=2024-08-28T16:33:25.649Z level=info caller=packetforward/packetforward_linux.go:102 msg="Packet forwarding metric header generated" ts=2024-08-28T16:33:26.547Z level=info caller=packetforward/packetforward_linux.go:130 msg="Packet forwarding metric compiled" ts=2024-08-28T16:33:26.549Z level=info caller=packetforward/packetforward_linux.go:176 msg="Packet forwarding metric initialized" ts=2024-08-28T16:33:26.549Z level=info caller=pluginmanager/pluginmanager.go:122 msg="Reconciled plugin" name=packetforward ts=2024-08-28T16:33:26.549Z level=info caller=pluginmanager/pluginmanager.go:173 msg="starting plugin packetforward" ts=2024-08-28T16:33:26.549Z level=info caller=packetforward/packetforward_linux.go:181 msg="Start collecting packet forward metrics" ts=2024-08-28T16:33:28.550Z level=info caller=linuxutil/linuxutil_linux.go:42 msg="Initializing linuxutil plugin..." ts=2024-08-28T16:33:28.550Z level=info caller=pluginmanager/pluginmanager.go:122 msg="Reconciled plugin" name=linuxutil ts=2024-08-28T16:33:28.550Z level=info caller=pluginmanager/pluginmanager.go:173 msg="starting plugin linuxutil" ts=2024-08-28T16:33:28.550Z level=info caller=linuxutil/linuxutil_linux.go:57 msg="Running linuxutil plugin..." ts=2024-08-28T16:33:30.550Z level=info caller=dns/dns_linux.go:90 msg="Stopped dns plugin" ts=2024-08-28T16:33:30.586Z level=info caller=dns/dns_linux.go:64 msg="Initialized dns plugin" ts=2024-08-28T16:33:30.587Z level=info caller=pluginmanager/pluginmanager.go:122 msg="Reconciled plugin" name=dns ts=2024-08-28T16:33:30.587Z level=info caller=pluginmanager/pluginmanager.go:173 msg="starting plugin dns" ts=2024-08-28T16:33:32.587Z level=info caller=packetparser/packetparser_linux.go:82 msg="PacketParser header generated at" path=/go/src/github.com/microsoft/retina/pkg/plugin/packetparser/_cprog/dynamic.h ts=2024-08-28T16:33:33.575Z level=info caller=packetparser/packetparser_linux.go:108 msg="PacketParser metric compiled" ts=2024-08-28T16:33:33.575Z level=info caller=packetparser/packetparser_linux.go:241 msg="Stopping packet parser" ts=2024-08-28T16:33:33.575Z level=info caller=packetparser/packetparser_linux.go:283 msg="Stopped packet parser" ts=2024-08-28T16:33:34.050Z level=info caller=common/common_linux.go:74 msg="perf reader created" Map=PerfEventArray(packetparser_events)#67 PageSize=4096 BufferSize=131072 ts=2024-08-28T16:33:34.050Z level=info caller=pluginmanager/pluginmanager.go:122 msg="Reconciled plugin" name=packetparser ts=2024-08-28T16:33:34.050Z level=info caller=pluginmanager/pluginmanager.go:173 msg="starting plugin packetparser" ts=2024-08-28T16:33:34.050Z level=info caller=packetparser/packetparser_linux.go:190 msg="Starting packet parser" ts=2024-08-28T16:33:34.050Z level=info caller=packetparser/packetparser_linux.go:192 msg="setting up enricher since pod level is enabled" ts=2024-08-28T16:33:34.051Z level=info caller=packetparser/packetparser_linux.go:212 msg="Attaching bpf program to default interface of k8s Node in node namespace" ts=2024-08-28T16:33:34.051Z level=info caller=packetparser/packetparser_linux.go:223 msg="Attaching Packetparser" outgoingLink.Index=2 outgoingLink.Name=eth0 outgoingLink.HardwareAddr=00:0d:3a:f7:c4:96 ts=2024-08-28T16:33:34.115Z level=info caller=packetparser/packetparser_linux.go:513 msg="Started packet parser" ts=2024-08-28T16:33:36.051Z level=info caller=pluginmanager/pluginmanager.go:181 msg="successfully started pluginmanager" ts=2024-08-28T16:33:50.484Z level=info caller=apiserver/apiserver.go:118 msg="New Apiserver IPs:" ip=4.154.192.90 panic: inconsistent label cardinality: expected 0 label values but got 1 in []string{"no_response"} goroutine 449 [running]: github.com/prometheus/client_golang/prometheus.(*CounterVec).WithLabelValues(0x31de654?, {0xc0008c7690?, 0x2faff00?, 0xc00374b840?}) /go/pkg/mod/github.com/prometheus/client_golang@v1.20.2/prometheus/counter.go:284 +0x34 github.com/microsoft/retina/pkg/module/metrics.(*LatencyMetrics).Init.func1({0xc0011a27d0?, 0x4499bd?}, 0x23e69c2?, 0x2962500?) /go/src/github.com/microsoft/retina/pkg/module/metrics/latency.go:127 +0x19b github.com/jellydator/ttlcache/v3.(*Cache[...]).OnEviction.func1.1() /go/pkg/mod/github.com/jellydator/ttlcache/v3@v3.3.0/cache.go:699 +0x31 created by github.com/jellydator/ttlcache/v3.(*Cache[...]).OnEviction.func1 in goroutine 271 /go/pkg/mod/github.com/jellydator/ttlcache/v3@v3.3.0/cache.go:698 +0xef
Expected behavior A clear and concise description of what you expected to happen.
Screenshots If applicable, add screenshots to help explain your problem.
Platform (please complete the following information):
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME aks-nodepool1-47685273-vmss000000 Ready agent 47d v1.27.16 10.0.0.5 <none> Ubuntu 22.04.4 LTS 5.15.0-1070-azure containerd://1.7.20-1 aks-nodepool1-47685273-vmss000001 Ready agent 47d v1.27.16 10.0.0.4 <none> Ubuntu 22.04.4 LTS 5.15.0-1070-azure containerd://1.7.20-1
Additional context Add any other context about the problem here.
should be fixed in 0.0.16 https://github.com/microsoft/retina/pull/687
Describe the bug
Retina experienced crash on start with image:
mcr.microsoft.com/containernetworking/retina-agent:v0.0.15
To Reproduce Steps to reproduce the behavior:
Expected behavior A clear and concise description of what you expected to happen.
Screenshots If applicable, add screenshots to help explain your problem.
Platform (please complete the following information):
mcr.microsoft.com/containernetworking/retina-agent:v0.0.15
Additional context Add any other context about the problem here.