Open klingerf opened 2 years ago
Spot-checking this locally:
:; linkerd diagnostics proxy-metrics -n linkerd po/linkerd-destination-774dbddb7f-q7wnz | grep -e ^response_total
response_total{direction="inbound",authority="linkerd-dst-headless.linkerd.svc.cluster.local:8086",target_addr="10.42.3.12:8086",target_ip="10.42.3.12",target_port="8086",tls="true",client_id="prometheus.linkerd-viz.serviceaccount.identity.linkerd.cluster.local",srv_name="default:all-unauthenticated",saz_name="default:all-unauthenticated",status_code="200",classification="success"} 7
response_total{direction="inbound",target_addr="0.0.0.0:4191",target_ip="0.0.0.0",target_port="4191",tls="true",client_id="prometheus.linkerd-viz.serviceaccount.identity.linkerd.cluster.local",srv_name="default:all-unauthenticated",saz_name="default:all-unauthenticated",status_code="200",classification="success"} 1391
response_total{direction="inbound",target_addr="10.42.3.12:9990",target_ip="10.42.3.12",target_port="9990",tls="no_identity",no_tls_reason="no_tls_from_remote",srv_name="default:all-unauthenticated",saz_name="default:all-unauthenticated",status_code="200",classification="success"} 6861
response_total{direction="inbound",target_addr="10.42.3.12:9996",target_ip="10.42.3.12",target_port="9996",tls="no_identity",no_tls_reason="no_tls_from_remote",srv_name="default:all-unauthenticated",saz_name="default:all-unauthenticated",status_code="200",classification="success"} 6860
response_total{direction="inbound",target_addr="10.42.3.12:9997",target_ip="10.42.3.12",target_port="9997",tls="true",client_id="prometheus.linkerd-viz.serviceaccount.identity.linkerd.cluster.local",srv_name="default:all-unauthenticated",saz_name="default:all-unauthenticated",status_code="200",classification="success"} 1392
response_total{direction="inbound",target_addr="0.0.0.0:4191",target_ip="0.0.0.0",target_port="4191",tls="no_identity",no_tls_reason="no_tls_from_remote",srv_name="default:all-unauthenticated",saz_name="default:all-unauthenticated",status_code="200",classification="success"} 6871
response_total{direction="inbound",target_addr="0.0.0.0:4191",target_ip="0.0.0.0",target_port="4191",tls="no_identity",no_tls_reason="no_tls_from_remote",srv_name="default:all-unauthenticated",saz_name="default:all-unauthenticated",status_code="503",classification="failure"} 1
response_total{direction="inbound",target_addr="10.42.3.12:9997",target_ip="10.42.3.12",target_port="9997",tls="no_identity",no_tls_reason="no_tls_from_remote",srv_name="default:all-unauthenticated",saz_name="default:all-unauthenticated",status_code="200",classification="success"} 6861
response_total{direction="inbound",target_addr="10.42.3.12:9996",target_ip="10.42.3.12",target_port="9996",tls="true",client_id="prometheus.linkerd-viz.serviceaccount.identity.linkerd.cluster.local",srv_name="default:all-unauthenticated",saz_name="default:all-unauthenticated",status_code="200",classification="success"} 1392
We don't actually see any repsonse_total
metrics for any of the controller ports. Presumably because all of these requests are long-lived streams, so the response never completes. (Edit: we see one for prometheus, explained below)
Why are you seeing response_total
metrics for 8086 and not 8090? My guess is that destination queries can actually complete if the proxy drops a stack for a given service (i.e. evicting from the proxy's cache). But policy streams will never be dropped until the client proxy shuts down.
If we look at the request_total
metrics we see connections we'd expect:
:; linkerd diagnostics proxy-metrics -n linkerd po/linkerd-destination-774dbddb7f-q7wnz | grep -e ^request_total
request_total{direction="inbound",authority="linkerd-dst-headless.linkerd.svc.cluster.local:8086",target_addr="10.42.3.12:8086",target_ip="10.42.3.12",target_port="8086",tls="true",client_id="prometheus.linkerd-viz.serviceaccount.identity.linkerd.cluster.local",srv_name="default:all-unauthenticated",saz_name="default:all-unauthenticated"} 8
request_total{direction="inbound",target_addr="0.0.0.0:4191",target_ip="0.0.0.0",target_port="4191",tls="true",client_id="prometheus.linkerd-viz.serviceaccount.identity.linkerd.cluster.local",srv_name="default:all-unauthenticated",saz_name="default:all-unauthenticated"} 1391
request_total{direction="inbound",target_addr="10.42.3.12:9990",target_ip="10.42.3.12",target_port="9990",tls="no_identity",no_tls_reason="no_tls_from_remote",srv_name="default:all-unauthenticated",saz_name="default:all-unauthenticated"} 6861
request_total{direction="inbound",target_addr="10.42.3.12:9996",target_ip="10.42.3.12",target_port="9996",tls="no_identity",no_tls_reason="no_tls_from_remote",srv_name="default:all-unauthenticated",saz_name="default:all-unauthenticated"} 6860
request_total{direction="inbound",authority="linkerd-dst-headless.linkerd.svc.cluster.local:8086",target_addr="10.42.3.12:8086",target_ip="10.42.3.12",target_port="8086",tls="true",client_id="linkerd-proxy-injector.linkerd.serviceaccount.identity.linkerd.cluster.local",srv_name="default:all-unauthenticated",saz_name="default:all-unauthenticated"} 3
request_total{direction="inbound",target_addr="10.42.3.12:9997",target_ip="10.42.3.12",target_port="9997",tls="true",client_id="prometheus.linkerd-viz.serviceaccount.identity.linkerd.cluster.local",srv_name="default:all-unauthenticated",saz_name="default:all-unauthenticated"} 1391
request_total{direction="inbound",target_addr="0.0.0.0:4191",target_ip="0.0.0.0",target_port="4191",tls="no_identity",no_tls_reason="no_tls_from_remote",srv_name="default:all-unauthenticated",saz_name="default:all-unauthenticated"} 6870
request_total{direction="inbound",authority="linkerd-dst-headless.linkerd.svc.cluster.local:8086",target_addr="10.42.3.12:8086",target_ip="10.42.3.12",target_port="8086",tls="true",client_id="tap.linkerd-viz.serviceaccount.identity.linkerd.cluster.local",srv_name="default:all-unauthenticated",saz_name="default:all-unauthenticated"} 1
request_total{direction="inbound",authority="linkerd-policy.linkerd.svc.cluster.local:8090",target_addr="10.42.3.12:8090",target_ip="10.42.3.12",target_port="8090",tls="true",client_id="linkerd-identity.linkerd.serviceaccount.identity.linkerd.cluster.local",srv_name="default:all-unauthenticated",saz_name="default:all-unauthenticated"} 7
request_total{direction="inbound",authority="linkerd-policy.linkerd.svc.cluster.local:8090",target_addr="10.42.3.12:8090",target_ip="10.42.3.12",target_port="8090",tls="true",client_id="",srv_name="default:all-unauthenticated",saz_name="default:all-unauthenticated"} 16
request_total{direction="inbound",target_addr="10.42.3.12:9997",target_ip="10.42.3.12",target_port="9997",tls="no_identity",no_tls_reason="no_tls_from_remote",srv_name="default:all-unauthenticated",saz_name="default:all-unauthenticated"} 6860
request_total{direction="inbound",authority="linkerd-dst-headless.linkerd.svc.cluster.local:8086",target_addr="10.42.3.12:8086",target_ip="10.42.3.12",target_port="8086",tls="true",client_id="tap-injector.linkerd-viz.serviceaccount.identity.linkerd.cluster.local",srv_name="default:all-unauthenticated",saz_name="default:all-unauthenticated"} 1
request_total{direction="inbound",target_addr="10.42.3.12:9996",target_ip="10.42.3.12",target_port="9996",tls="true",client_id="prometheus.linkerd-viz.serviceaccount.identity.linkerd.cluster.local",srv_name="default:all-unauthenticated",saz_name="default:all-unauthenticated"} 1391
That mostly looks like I'd expect, with one exception:
request_total{direction="inbound",authority="linkerd-policy.linkerd.svc.cluster.local:8090",target_addr="10.42.3.12:8090",target_ip="10.42.3.12",target_port="8090",tls="true",client_id="",srv_name="default:all-unauthenticated",saz_name="default:all-unauthenticated"} 16
This claims there's a policy lookup from a pod that doesn't have a client identity. Perhaps this is the identity controller starting up? We'll probably want to look more closely at it.
Thinking about this a bit more, I expect there is some sort of race here during startup. Proxies may start watching policy before the proxy has provisioned its certificate, in general. I'm not sure that's really a problem, exactly, from Linkerd's perspective, though...
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions.
What is the issue?
I'm not seeing a
client_id
label on any of theresponse_total
stats that are exported by the inbound proxy of thelinkerd-destination
pod when the target port is 8090 (policy), but I am seeing that label set when the target port is 8086 (destination).It's a bit easier to illustrate with a comparison of these two promql queries:
Maybe this is intentional? Without a
client_id
label set for requests to 8090, however, we can't dedupe traffic to that port.How can it be reproduced?
Install linkerd and linkerd-viz, then:
Visit http://localhost:9090 and run the following query:
Logs, error output, etc
See above
output of
linkerd check -o short
Environment
Possible solution
This might be working as expected, in which case we can close it.
Additional context
No response
Would you like to work on fixing this bug?
no