linkerd / linkerd2

Ultralight, security-first service mesh for Kubernetes. Main repo for Linkerd 2.x.
https://linkerd.io
Apache License 2.0
10.65k stars 1.28k forks source link

`linkerd routes` only maps traffic to the http method, not the path #2060

Closed jon-walton closed 5 years ago

jon-walton commented 5 years ago

Bug Report

What is the issue?

After creating a service profile and generating a small load against the service (10RPS):

echo "GET https://example.com/api/hello/world" | vegeta attack -timeout 5s -rate 10 | tee | vegeta report --every 500ms

I can clearly see that

linkerd routes deploy/myapp -n myappns -t 1m

is mapping the traffic (in this case 10RPS) against the first matching http method in the service profile.

if we take an example service profile spec (generated via --open-api , but changing the route names to protect the innocent 😉 ):

  spec:
    routes:
    - condition:
        method: GET
        path_regex: /api/user/profile
      name: GET /api/user/profile
      response_classes:
      - condition:
          status:
            max: 200
            min: 200
    - condition:
        method: POST
        path_regex: /api/something/[^/]*
      name: POST /api/something/{id}
      response_classes:
      - condition:
          status:
            max: 200
            min: 200
    - condition:
        method: PUT
        path_regex: /api/something/[^/]*
      name: PUT /api/something/{id}
      response_classes:
      - condition:
          status:
            max: 200
            min: 200
    - condition:
        method: GET
        path_regex: /api/hello/world
      name: GET /api/hello/world
      response_classes:
      - condition:
          status:
            max: 200
            min: 200

linkerd routes deploy/myapp -n myappns -t 1m

will print 10RPS against GET /api/user/profile and not display anything for /api/hello/world

this holds true for other methods in our application all POST requests will be registered against POST /api/something/{id}

doing a linkerd tap deploy/myapp -n myappns shows the traffic hitting the expected routes

How can it be reproduced?

Logs, error output, etc

(If the output is long, please create a gist and paste the link here.)

linkerd check output

$ l check
kubernetes-api: can initialize the client..................................[ok]
kubernetes-api: can query the Kubernetes API...............................[ok]
kubernetes-api: is running the minimum Kubernetes API version..............[ok]
linkerd-api: control plane namespace exists................................[ok]
linkerd-api: control plane pods are ready..................................[ok]
linkerd-api: can initialize the client.....................................[ok]
linkerd-api: can query the control plane API...............................[ok]
linkerd-api[kubernetes]: control plane can talk to Kubernetes..............[ok]
linkerd-api[prometheus]: control plane can talk to Prometheus..............[ok]
linkerd-api: no invalid service profiles...................................[ok]
linkerd-version: can determine the latest version..........................[ok]
linkerd-version: cli is up-to-date.........................................[ok]
linkerd-version: control plane is up-to-date...............................[warning] -- is running version 884ae673 but the latest edge version is 18.12.4

Environment

Client version: edge-18.12.4
Server version: git-884ae673

The server is running git-884ae673 to resolve memory leaks. I'll try to repro this once the next edge is release :+1:

Possible solution

Additional context

jon-walton commented 5 years ago

yep, same behaviour with edge-19.1.1

$ l check
kubernetes-api: can initialize the client..................................[ok]
kubernetes-api: can query the Kubernetes API...............................[ok]
kubernetes-api: is running the minimum Kubernetes API version..............[ok]
linkerd-existence: control plane namespace exists..........................[ok]
linkerd-existence: controller pod is running...............................[ok]
linkerd-existence: can initialize the client...............................[ok]
linkerd-existence: can query the control plane API.........................[ok]
linkerd-api: control plane pods are ready..................................[ok]
linkerd-api: can query the control plane API...............................[ok]
linkerd-api[kubernetes]: control plane can talk to Kubernetes..............[ok]
linkerd-api[prometheus]: control plane can talk to Prometheus..............[ok]
linkerd-api: no invalid service profiles...................................[ok]
linkerd-version: can determine the latest version..........................[ok]
linkerd-version: cli is up-to-date.........................................[ok]
linkerd-version: control plane is up-to-date...............................[ok]

Status check results are [ok]
$ l version
Client version: edge-19.1.1
Server version: edge-19.1.1
adleong commented 5 years ago

hey @jon-walton! I think the problem is that your service profile use the old path_regex which is has been replaced by pathRegex. The old fields was probably being ignored which is why your routes weren't matching properly. If you run linkerd profile --open-api again with a recent version of the CLI, you should get a service profile with correct field names.

The fact that Linkerd silently ignored the path_regex field instead of raising some kind of error or warning is a problem though. We've started to address this problem with https://github.com/linkerd/linkerd2/pull/2024 but clearly there's more to do. If you have any suggestions on how we could improve the validation to catch situations like this, I'd love to hear your thoughts.

jon-walton commented 5 years ago

ah I see. This was my first look at service profiles so if something changed, it may be due to the server running a git build and my client being edge. I'll re-import it on Monday now that everything is back on the same edge build

Thanks

jon-walton commented 5 years ago

hey @adleong that did it, thanks. Now that the server and client are on the same version, I deleted the existing service profile from kubernetes, recreated it and it working fine.

Did you want to keep this issue open to track it silently ignoring path_regex ?

grampelberg commented 5 years ago

@jon-walton would you mind opening up a different issue? It'd be nice to have a check that shows your cli/control plane/data plane are out of sync.

adleong commented 5 years ago

Filed https://github.com/linkerd/linkerd2/issues/2075

adleong commented 5 years ago

Also https://github.com/linkerd/linkerd2/issues/2076