zalando / skipper

An HTTP router and reverse proxy for service composition, including use cases like Kubernetes Ingress
https://opensource.zalando.com/skipper/
Other
3.05k stars 345 forks source link

Provide additional information for a route from Kubernetes #820

Open wndhydrnt opened 5 years ago

wndhydrnt commented 5 years ago

I'm using Skipper do to a Canary deployment of an application on Kubernetes. After the deployment of the Canary, I want to know how it is doing by checking the results of HTTP requests that the Canary has served. Prometheus scrapes the metrics from Skipper and I want to use Grafana to create two graphs. One graph should show the HTTP requests of the Current deployment of the application and one for the Canary of the application.

When looking at the metrics exported by Skipper, I don't know how to accomplish this. In my test setup, the metrics are exported like this:

skipper_serve_route_duration_seconds_bucket{code="200",method="GET",route="kube_default__whoami_0__whoami_dev_local_____whoami_5740a829",le="0.005"} 12
skipper_serve_route_duration_seconds_bucket{code="200",method="GET",route="kube_default__whoami_0__whoami_dev_local_____whoami_5740a829",le="0.01"} 12
skipper_serve_route_duration_seconds_bucket{code="200",method="GET",route="kube_default__whoami_0__whoami_dev_local_____whoami_5740a829",le="0.025"} 12
skipper_serve_route_duration_seconds_bucket{code="200",method="GET",route="kube_default__whoami_0__whoami_dev_local_____whoami_5740a829",le="0.05"} 12
skipper_serve_route_duration_seconds_bucket{code="200",method="GET",route="kube_default__whoami_0__whoami_dev_local_____whoami_5740a829",le="0.1"} 12
skipper_serve_route_duration_seconds_bucket{code="200",method="GET",route="kube_default__whoami_0__whoami_dev_local_____whoami_5740a829",le="0.25"} 12
skipper_serve_route_duration_seconds_bucket{code="200",method="GET",route="kube_default__whoami_0__whoami_dev_local_____whoami_5740a829",le="0.5"} 12
skipper_serve_route_duration_seconds_bucket{code="200",method="GET",route="kube_default__whoami_0__whoami_dev_local_____whoami_5740a829",le="1"} 12
skipper_serve_route_duration_seconds_bucket{code="200",method="GET",route="kube_default__whoami_0__whoami_dev_local_____whoami_5740a829",le="2.5"} 12
skipper_serve_route_duration_seconds_bucket{code="200",method="GET",route="kube_default__whoami_0__whoami_dev_local_____whoami_5740a829",le="5"} 12
skipper_serve_route_duration_seconds_bucket{code="200",method="GET",route="kube_default__whoami_0__whoami_dev_local_____whoami_5740a829",le="10"} 12
skipper_serve_route_duration_seconds_bucket{code="200",method="GET",route="kube_default__whoami_0__whoami_dev_local_____whoami_5740a829",le="+Inf"} 12
skipper_serve_route_duration_seconds_sum{code="200",method="GET",route="kube_default__whoami_0__whoami_dev_local_____whoami_5740a829"} 0.009401554
skipper_serve_route_duration_seconds_count{code="200",method="GET",route="kube_default__whoami_0__whoami_dev_local_____whoami_5740a829"} 12
skipper_serve_route_duration_seconds_bucket{code="200",method="GET",route="kube_default__whoami_0__whoami_dev_local_____whoami_af63a7a0",le="0.005"} 2
skipper_serve_route_duration_seconds_bucket{code="200",method="GET",route="kube_default__whoami_0__whoami_dev_local_____whoami_af63a7a0",le="0.01"} 2
skipper_serve_route_duration_seconds_bucket{code="200",method="GET",route="kube_default__whoami_0__whoami_dev_local_____whoami_af63a7a0",le="0.025"} 2
skipper_serve_route_duration_seconds_bucket{code="200",method="GET",route="kube_default__whoami_0__whoami_dev_local_____whoami_af63a7a0",le="0.05"} 2
skipper_serve_route_duration_seconds_bucket{code="200",method="GET",route="kube_default__whoami_0__whoami_dev_local_____whoami_af63a7a0",le="0.1"} 2
skipper_serve_route_duration_seconds_bucket{code="200",method="GET",route="kube_default__whoami_0__whoami_dev_local_____whoami_af63a7a0",le="0.25"} 2
skipper_serve_route_duration_seconds_bucket{code="200",method="GET",route="kube_default__whoami_0__whoami_dev_local_____whoami_af63a7a0",le="0.5"} 2
skipper_serve_route_duration_seconds_bucket{code="200",method="GET",route="kube_default__whoami_0__whoami_dev_local_____whoami_af63a7a0",le="1"} 2
skipper_serve_route_duration_seconds_bucket{code="200",method="GET",route="kube_default__whoami_0__whoami_dev_local_____whoami_af63a7a0",le="2.5"} 2
skipper_serve_route_duration_seconds_bucket{code="200",method="GET",route="kube_default__whoami_0__whoami_dev_local_____whoami_af63a7a0",le="5"} 2
skipper_serve_route_duration_seconds_bucket{code="200",method="GET",route="kube_default__whoami_0__whoami_dev_local_____whoami_af63a7a0",le="10"} 2
skipper_serve_route_duration_seconds_bucket{code="200",method="GET",route="kube_default__whoami_0__whoami_dev_local_____whoami_af63a7a0",le="+Inf"} 2
skipper_serve_route_duration_seconds_sum{code="200",method="GET",route="kube_default__whoami_0__whoami_dev_local_____whoami_af63a7a0"} 0.002127573
skipper_serve_route_duration_seconds_count{code="200",method="GET",route="kube_default__whoami_0__whoami_dev_local_____whoami_af63a7a0"} 2

In Kubernetes, two different services, whoami-5740a829 (Current) and whoami-af63a7a0 (Canary) exist. The Ingress, in this case called whoami-0, has the following annotation set:

zalando.org/backend-weights:  {"whoami-5740a829":90,"whoami-af63a7a0":10}

The issue I'm facing is that it is not possible to create a link between a deployment/service in Kubernetes and a route that is served by Skipper for that deployment/service. Skipper replaces the - in the name of the service with a _. Doing this in a Prometheus query is not possible, at least I did not find a solution. For me, the easiest solution would be to query for a label that contains the Kubernetes service, e.g.:

skipper_serve_route_duration_seconds_count{service="whoami-5740a829"}

Is there another way to work around this or is a new feature needed to add more labels to the Prometheus metrics?

szuecs commented 5 years ago

There is no workaround, because skipper route table and the proxy doesn’t know “kubernetes service“. The route id is created by data from the ingress object such that it is the possible workaround, but it could (might not, but I don’t think it is similar to an interface) change in the future. I think if we refactor the loadbalancer we can add a feature that can help, but right now I have no good idea. Maybe @aryszka or @mikkeloscar have one?

mikkeloscar commented 5 years ago

We could possibly extend routes to include labels which could be passed to the metrics sink instead of just the route id: https://github.com/zalando/skipper/blob/77d84393cf1060ed6ead4d8125167daf419a9a0f/proxy/proxy.go#L1088-L1094

This way we could create routes with labels in the ingress datasource and provide extra context. It should be fairly easy to implement, but we must ofc. consider the trade-off which is more data in the metrics and with many ingresses this could really explode.

However putting something like this behind a feature flag doesn't sound unreasonable to me.

WDYT?

szuecs commented 5 years ago

Sounds reasonable to me to add labels as map[string]string to the interface maybe metrics.MeasureServeWithLabels() ? I think with that we can also enable the stackset-controller to support automated traffic switching with getting feedback loop from these label based error rate and response rates.

@aryszka what do you think?

aryszka commented 5 years ago

Unfortunately, i also cannot see a workaround at the moment without a code change.

I agree with @szuecs: the generated route id should stay opaque.

I like @mikkeloscar's suggestion, including using a feature toggle. It's simple, and doesn't change the interface except for the additional feature toggle.. I haven't verified it yet in the code but I believe it is simple to implement it, too.

wndhydrnt commented 5 years ago

Thanks for the feedback to all of you. Besides adding labels to each skipper_serve_route_* metrics, an alternative might be worth considering. I've been working with kube-state-metrics lately. That application exposes a metric called kube_deployment_labels for each deployment in Kubernetes. The value of the metric is always 1. Using the group_left function in Prometheus, I am able to add additional labels to other metrics when writing queries.

The exported metrics could look like this (some skipper_serve_route_* left out to improve readability):

skipper_serve_route_labels{backend="whoami-af63a7a0",host="whoami.dev.local",name="whoami-0",namespace="default",route="kube_default__whoami_0__whoami_dev_local_____whoami_af63a7a0"} 1
skipper_serve_route_duration_seconds_bucket{code="200",method="GET",route="kube_default__whoami_0__whoami_dev_local_____whoami_af63a7a0",le="0.005"} 2
skipper_serve_route_duration_seconds_sum{code="200",method="GET",route="kube_default__whoami_0__whoami_dev_local_____whoami_af63a7a0"} 0.002127573
skipper_serve_route_duration_seconds_count{code="200",method="GET",route="kube_default__whoami_0__whoami_dev_local_____whoami_af63a7a0"} 2

A query then looks like this:

skipper_serve_route_duration_seconds_bucket * on (route) group_left(backend,name) skipper_serve_route_labels

and a metric in the result set of that query looks like this:

skipper_serve_route_duration_seconds_bucket{code="200",method="GET",route="kube_default__whoami_0__whoami_dev_local_____whoami_af63a7a0",le="0.005",backend="whoami-af63a7a0",name="whoami-0"} 2

This way, no existing function defined by the interface needs to be altered. A new method to create the skipper_serve_route_labels metric needs to be added though.

I'd like to add that I don't think this solution is in some way "better" than what has been proposed in this thread before. It's just nice to have two options to pick from. What do you think?

szuecs commented 5 years ago

@wndhydrnt I think it’s fine to add also your suggestion if you explicitly guard this by an option flag to make it opt-in, because it will create quite a bunch of more metrics and I am not sure if I would always enable it. Adding the labels to the current metrics is fine to add without option. Do you want to work on that feature?

wndhydrnt commented 5 years ago

@szuecs sure, I'll take a look. I think I can come up with something in a few weeks.