linkerd / linkerd2

Ultralight, security-first service mesh for Kubernetes. Main repo for Linkerd 2.x.
https://linkerd.io
Apache License 2.0
10.63k stars 1.28k forks source link

Linkerd route level RPS not coming up #11671

Closed mayank-ag-dev closed 3 months ago

mayank-ag-dev commented 10 months ago

What is the issue?

We have deployed Linkerd stable v2.14.0 on GKE v1.24. We configured a service profile for an application, and the routes were getting added, but we could not see the RPS.

How can it be reproduced?

GKE v1.24 Linkerd stable-v2.14.0

Logs, error output, etc

ROUTE SERVICE SUCCESS RPS LATENCY_P50 LATENCY_P95 LATENCY_P99 [DEFAULT] svc1 - - - - - healthz svc1 - - - - - version svc1 - - - - -

output of linkerd check -o short

linkerd-version

‼ cli is up-to-date is running version 2.14.0 but the latest stable version is 2.14.5 see https://linkerd.io/2.14/checks/#l5d-version-cli for hints

control-plane-version

‼ control plane is up-to-date is running version 2.14.0 but the latest stable version is 2.14.5 see https://linkerd.io/2.14/checks/#l5d-version-control for hints

linkerd-control-plane-proxy

‼ control plane proxies are up-to-date some proxies are not running the current version:

linkerd-viz

‼ linkerd-viz pods are injected could not find proxy container for metrics-api-f46599848-6j2lz pod see https://linkerd.io/2.14/checks/#l5d-viz-pods-injection for hints ‼ viz extension pods are running container "linkerd-proxy" in pod "metrics-api-f46599848-6j2lz" is not ready see https://linkerd.io/2.14/checks/#l5d-viz-pods-running for hints ‼ viz extension proxies are healthy no "linkerd-proxy" containers found in the "linkerd" namespace see https://linkerd.io/2.14/checks/#l5d-viz-proxy-healthy for hints

Status check results are √

Environment

GKE v1.24 Linkerd stable-2.14.0

Possible solution

No response

Additional context

No response

Would you like to work on fixing this bug?

yes

kflynn commented 10 months ago

Hey @mayank-ag-dev! Those errors from linkerd check are very concerning – they look an awful lot like linkerd-viz isn't set up correctly. Maybe uninstall and reinstall it?

Assuming that you clear the Viz errors and it's still not working, we'd like to see the Service and ServiceProfile for at least one of these workloads... thanks!

mayank-ag-dev commented 10 months ago

Hey @kflynn I resolved the errors for linkerd viz sharing the service and service profile snippet

apiVersion: linkerd.io/v1alpha2
kind: ServiceProfile
metadata:
  name: podinfo-svc.podinfo.svc.cluster.local
  namespace: podinfo
spec:
  routes:
    - name: health-check
      condition:
        method: GET
        pathRegex: /healthz
    - name: version
      condition:
        method: GET
        pathRegex: /version
---
apiVersion: v1
items:
- apiVersion: v1
  kind: Service
  metadata:
    name: podinfo-svc
    namespace: podinfo
  spec:
    ports:
    - name: http
      port: 9898
      protocol: TCP
      targetPort: http
    - name: grpc
      port: 9999
      protocol: TCP
      targetPort: grpc
    selector:
      app: podinfo
    type: ClusterIP
kflynn commented 10 months ago

And after the Viz errors are resolved, it's still not working?

mayank-ag-dev commented 10 months ago

Yes... Are there any configuration changes for linkerd service profile stable-v2.14.0?

--------------
√ can initialize the client
√ can query the Kubernetes API

kubernetes-version
------------------
√ is running the minimum Kubernetes API version

linkerd-existence
-----------------
√ 'linkerd-config' config map exists
√ heartbeat ServiceAccount exist
√ control plane replica sets are ready
√ no unschedulable pods
√ control plane pods are ready
√ cluster networks contains all node podCIDRs
√ cluster networks contains all pods
√ cluster networks contains all services

linkerd-config
--------------
√ control plane Namespace exists
√ control plane ClusterRoles exist
√ control plane ClusterRoleBindings exist
√ control plane ServiceAccounts exist
√ control plane CustomResourceDefinitions exist
√ control plane MutatingWebhookConfigurations exist
√ control plane ValidatingWebhookConfigurations exist
√ proxy-init container runs as root user if docker container runtime is used

linkerd-identity
----------------
√ certificate config is valid
√ trust anchors are using supported crypto algorithm
√ trust anchors are within their validity period
√ trust anchors are valid for at least 60 days
√ issuer cert is using supported crypto algorithm
√ issuer cert is within its validity period
√ issuer cert is valid for at least 60 days
√ issuer cert is issued by the trust anchor

linkerd-webhooks-and-apisvc-tls
-------------------------------
√ proxy-injector webhook has valid cert
√ proxy-injector cert is valid for at least 60 days
√ sp-validator webhook has valid cert
√ sp-validator cert is valid for at least 60 days
√ policy-validator webhook has valid cert
√ policy-validator cert is valid for at least 60 days

linkerd-version
---------------
√ can determine the latest version
‼ cli is up-to-date
    is running version 2.14.0 but the latest stable version is 2.14.5
    see https://linkerd.io/2.14/checks/#l5d-version-cli for hints

control-plane-version
---------------------
√ can retrieve the control plane version
‼ control plane is up-to-date
    is running version 2.14.0 but the latest stable version is 2.14.5
    see https://linkerd.io/2.14/checks/#l5d-version-control for hints
√ control plane and cli versions match

linkerd-control-plane-proxy
---------------------------
√ control plane proxies are healthy
‼ control plane proxies are up-to-date
    some proxies are not running the current version:
    * linkerd-destination-64fd9c9866-pbzxt (stable-2.14.0)
    * linkerd-identity-6c5fc457db-pwl7f (stable-2.14.0)
    * linkerd-proxy-injector-5d85b4686f-mg77v (stable-2.14.0)
    see https://linkerd.io/2.14/checks/#l5d-cp-proxy-version for hints
√ control plane proxies and cli versions match

linkerd-viz
-----------
√ linkerd-viz Namespace exists
√ can initialize the client
√ linkerd-viz ClusterRoles exist
√ linkerd-viz ClusterRoleBindings exist
√ tap API server has valid cert
√ tap API server cert is valid for at least 60 days
√ tap API service is running
√ linkerd-viz pods are injected
√ viz extension pods are running
√ viz extension proxies are healthy
‼ viz extension proxies are up-to-date
    some proxies are not running the current version:
    * metrics-api-75f76fbd65-44wv8 (stable-2.14.0)
    * prometheus-7c74c74478-7fxkz (stable-2.14.0)
    * tap-6665794f66-f6ksl (stable-2.14.0)
    * tap-injector-74f66f65d5-zkw9v (stable-2.14.0)
    * web-78c46f4b57-8wx9z (stable-2.14.0)
    see https://linkerd.io/2.14/checks/#l5d-viz-proxy-cp-version for hints
√ viz extension proxies and cli versions match
√ prometheus is installed and configured correctly
√ viz extension self-check

Status check results are √
kflynn commented 10 months ago

@mayank-ag-dev I think the biggest question here is whether you're using ServiceProfiles or HTTPRoutes. For per-route metrics at the moment, you need to be using ServiceProfiles.

mayank-ag-dev commented 10 months ago

@kflynn We are using serviceProfiles for HTTPRoutes.

kflynn commented 9 months ago

@mayank-ag-dev 🤦‍♂️ So sorry to ask you to confirm ServiceProfiles when you'd already posted a ServiceProfile! Let me poke a little more into this.

akashsethiya commented 8 months ago

@kflynn Any update on this? We have major impact on observability cz of this.

kflynn commented 7 months ago

So far I haven't managed to reproduce this. 🙁 Are you on our Slack? If so, I'd like to connect there and try a few things with you.

stale[bot] commented 4 months ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions.