nginxinc / nginx-service-mesh

A service mesh powered by NGINX Plus to manage container traffic in Kubernetes environments.
https://docs.nginx.com/nginx-service-mesh
Apache License 2.0
93 stars 30 forks source link

Metrics: label name "dst_namespace" is not unique: invalid sample #22

Closed xavipanda closed 3 years ago

xavipanda commented 3 years ago

Hi,

Often i see this problem within the Targets of Prometheus of the nginx-mesh.

label name "dst_namespace" is not unique: invalid sample

I see entries like this: nginxplus_upstream_server_unavail {upstream="172.20.115.57_80", upstream_index="0000", dst_service="svcinfo-primary",server="10.2.22.162:9898", dst_namespace="canary", dst_deployment="svcinfo-primary", dst_pod="svcinfo-primary-59949d4ccf-slfj9", dst_namespace="ingress-nginx", dst_deployment="", dst_pod="ingress-nginx-admission-patch-nlf2c"} 0

The most strange thing is that on those pods:

f5yacobucci commented 3 years ago

Is this a deployment using --mtls-mode=permissive (default)?

Can you supply the exact deploy command you used to provision NSM?

I presume the "label name "dst_namespace" is not unique: invalid sample" is a Prometheus log, correct?

Have you deployed the NGINX Ingress Controller and followed the walkthroughs and tutorials here and here?

xavipanda commented 3 years ago

@f5yacobucci

  1. permissive (default) yes.
  2. it is the default command, besides the registry piece. I added the skip-namespace for the ingress piece.
  3. It comes from prometheus that was deployed by the NSM. No changes on it
  4. Yes, the label/annotations are there. So sidecar is not injected.

I added same annotations/labels on the jobs that takes place for the ingress patching (reload). The problem is that the metrics exporters WITHIN the sidecars of pods that are in the mesh are not able to identify such tags. Since they intercept all the traffic without distinction, metrics are generated with the same pattern. I do believe that a fix comes from "do not compose metrics from elements that are not part of the mesh" ? Behaviour is quite weird tbh, the namespace for this job is excluded from the mesh.. there is no relation at all or even comms between those two elements (batch & pod), im not able to provide more insights behind it as there is no source-code for the metrics component.

I do believe that an easy approach for fixing this behaviour is:

if dst || src namespace in $skip_namespace == no_metrics

I will try as well to reproduce it on 1.0 GA

xavipanda commented 3 years ago

the default behaviour of injection cannot be set. that ultimate ends into this unexpected behaviour of duplicated labels. to avoid this, set auto-inject as false, and inject on demand with the same way.