Open universam1 opened 3 years ago
Hi, thank you for the feature request, I'll add a couple of notes.
Ingress properties (namespace, name, path, etc) used to construct route id are not preserved up to the point where metrics are collected. Metrics collection attempts to fit two implementations (coda hale and prometheus) under the same interface and thus lacks labels concept which would complicate the feature implementation.
Note, other projects try to disassemble
This is a slippery road and most probably will break down the line as route id generation is an implementation detail and is not even public in the library.
Thank you for the feedback and the references!
This is a slippery road and most probably will break down the line
Fully agree - that is eventually the core problem described in this issue that there is currently neither a way via standard patterns such as Prometheus labels nor a published API to solve this demand. My preference would though certainly be to extend the labels following canonical concept.
attempts to fit two implementations (coda hale and prometheus) under the same interface and thus lacks labels concept which would complicate the feature implementation
Looking through the sources one idea could be to replace the routeId string
with an interface like this:
type RouteIDer interface {
// implements current signatures from rgRouteID, eastWestRouteID, routeID
String() string
// native fields exporter
PrometheusLabels() []string
CodaHaleLabels() xyz
}
RGs, EastWest, Ingress can implement via a struct fulfilling above iface. Looks like at this point for instance the labels can be extended then? https://github.com/zalando/skipper/blob/master/metrics/prometheus.go#L421
@universam1 , first I like the idea and route was just for operations to make sure they can figure out in case they need to have something and have not a metrics explosion in cardinality. I would expect a bit of math how much the mentioned dimensions (labels) will cost in memory. We should think about it with different patterns of input. One of ours is having a cluster with 40k ingress (created by 400 fabric gateways, that on average would create 100 ingress. These are some kind of internal requirements) and I would like to keep the scalability like this.).
I would approach this slightly differently regarding the implementation details:
Ideally, we will be able to drop the mandatory and unique route ID from the eskip.Route object (and therefore also from the eskip format) in the future. To pass in values for the monitoring, I would rather create symbolic filters, which can be used during converting the ingress objects, but also in free-form configuration. OTOH, we can add new methods to the routing.Route objects, which in contrary to the eskip.Route, is meant to be more like an internal "runtime" representation of the route. The PrometheusLabels() (or just simply Labels()) idea is nice and fits the routing.Route type.
I would rather create symbolic filters, which can be used during converting the ingress objects, but also in free-form configuration.
I think (not 100% sure though) Prometheus requires a set of labels to be known upfront so we may come up up with a config key like -prometheus-route-labels=namespace,service,whitelabel
and a "marker" filter like metricsLabels("namespace=whatever", "service=nginx", "whitelabel=mycustomlabelforthisroute")
. Ingress/routegroup processor may inject them unconditionally or based on some annotation. Also not all routes come from k8s objects so some labels may not be applicable and I guess would have empty values.
I also saw a case recently where users would like to get metrics for a specific stack version created by https://github.com/zalando-incubator/stackset-controller
I would expect a bit of math how much the mentioned dimensions (labels) will cost in memory.
Looking at the subject (success rate and latency for a backend service) it may happen that absolute minimum set of labels is namespace
and service
(or endpoint
), i.e. ingress name is not needed.
Prometheus requires a set of labels to be known upfront
That's my conclusion after trying a couple of ideas how to make the labels dynamic, since as you stated the common denominator of route sources is rather short. Their early init appears hard to support variadic labels to me.
absolute minimum set of labels is
namespace
andservice
(orendpoint
)
We'd be highly interested in path
or maybe its matcher. Think about microservice architecture where we need to measure the individual services mounted at different paths, bringing light into a monolith blackbox.
we can add new methods to the routing.Route objects, which in contrary to the eskip.Route
Great detail, thanks. I'm trying quite a bit to wrap my head around the abstractions and internal concepts, I'd love to provide contribution here with this issue but guess I'm struggling to grasp where and how the generators are used... 🤔
indeed, I haven't thought of the fixed label keys of Prometheus. We may need to do something 'ugliish', like:
I am a proponent of the first option.
would be great if Skipper could be added to the list of supported Ingresses for SLO operator
I am not sure if I understand what you mean @universam1 . The linked project looks not really useful for skipper. It looks more like a project to expose metrics of your backend application.
I don't undstand what the mentioned SLO operator is. Seems like another project, right?
Pardon for not providing enough context. We are leveraging this SLO Operator https://github.com/slok/sloth where we define the SLI's mostly on Skipper route metrics. As initially stated they are challenging for the devs to generate currently.
I linked the plugin sdk of this Operator, which when this issue would be implemented, would enable one to create a plugin tailored to Skipper, as being done for other Ingress projects.
Basically just a teaser or reason for this to happen 😁
Sounds great to me!
Btw. path metrics: For the path metrics, it might also make sense to have a config options to enable it like @jonathanbeber did for other metrics to reduce the number of keys. We had issues with Prometheus.
Is your feature request related to a problem? Please describe. Prometheus metrics contain one central metric
route
that becomes one concatenated string of various independent fields such asThis implementation is problematic as it is
Describe alternatives you've considered (optional) Above properties should become distinct Prometheus labels, the current routeID should be used internally only.
*RouteID
could become a struct withStringer()
method to render to current string where needed internally, but will provide a map of individual labels for Prometheus.Idea: current:
{route="kube_appsvcs__docs__techdocs_myservice_org_____appdocs"}
new:{namespace="appsvcs", ingress="docs", host="techdocs.myservice.org", path="/", backend="appdocs"}
Additional context (optional) examples where devs struggle to understand their usage:
Note, other projects try to disassemble https://github.com/fluxcd/flagger/blob/e6c740d917bdb3da5163bdef41b66ce8d20f31bf/pkg/metrics/observers/skipper.go#L30
Would you like to work on it? possibly