sinkingpoint / prometheus-gravel-gateway

A Prometheus Aggregation Gateway for FAAS applications
GNU Lesser General Public License v3.0
115 stars 10 forks source link

metric label overriding #9

Closed ltagliamonte closed 2 years ago

ltagliamonte commented 2 years ago

I have 2 lambda functions pointing to the same gateway. the first lambda sends:

the second lambda sends:

when I curl the gateway endpoint i see only:

requests_num_total{LAMBDA_NAME="test_function"} 1
requests_num_total{LAMBDA_NAME="test"} 1

where the label job has been overridden by LAMBDA_NAME.

How to reproduce

curl the gateway endpoint curl 127.0.0.1:4278/metrics:

# TYPE requests_num_total counter
requests_num_total{LAMBDA_NAME="test_function"} 1
requests_num_total{LAMBDA_NAME="test"} 1

as you can see the label job has been renamed to LAMBDA_NAME the first label sent always overrides any other following labels.

sinkingpoint commented 2 years ago

Interesting - I can see where this is happening, but I'm curious about the different behaviors we can have to solve this. We could reject the second push (for having a different labelset), or we could somehow munge them together to have both labels perhaps

ltagliamonte commented 2 years ago

I would hash the metric names with the labels so that each name+labels is unique.

sinkingpoint commented 2 years ago

Doing so would kind of be in violation of the Prometheus spec though (https://prometheus.io/docs/instrumenting/writing_clientlibs/#labels)

Client libraries MUST NOT allow users to have different label names for the same metric for Gauge/Counter/Summary/Histogram or any other Collector offered by the library.

So, yes, it's a but that we're mangling the label name but I think the "correct" fix is actually rejecting the second push as an invalid label set and expecting that clients properly namespace their metrics

ltagliamonte commented 2 years ago

@sinkingpoint in this way i can't use a gateway for multiple lambdas. This becomes expensive to support.

What about defining a namespace label? Each lamba uses it's own namespace in which we can have the metric name? the prom push gateway uses a different path per job like /metrics/job/jobName/ maybe in this way it gets easier to manage in the code

sinkingpoint commented 2 years ago

Yeah I get that. FWIW, the Gravel Gateway also supports the push gateways label syntax. Using a job label seems to be the more standard solution to this, but I'll raise it internally and see what we can do

ltagliamonte commented 2 years ago

@sinkingpoint thank you, do you have any update on this? I'm preparing a lambda toolchain and i'd like to get the monitoring figure it out as well.

As i said i'd like to host the gravel gateway "as a service" for all my lambdas functions. I'd also understand better how scaling in/out works in the cluster mode, if you want i'd will open another issue.

sinkingpoint commented 2 years ago

Apologies - still trying to figure this out a bit. We're leaning to the idea that we will reject metrics pushes with invalid labels (in order to solve the initial problem), and expect consumers to supply a job label (or similar) to distinguish individual pushes - as mentioned, we do actually support the Push Gateway URL syntax to achieve that

sinkingpoint commented 2 years ago

With #10 we've fixed this, such that the second push in the issue will be rejected with invalid labels. As before, you can use the push gateway syntax to set a label to distinguish individual pushes

ltagliamonte commented 2 years ago

@sinkingpoint thank you for working on this. Just to make sure I understand the fix: If the metric name contains a label job the labels will not be overridden anymore. Effectively job is the sharding key, correct?

sinkingpoint commented 2 years ago

The fix was to make it not override labels at all - this was the bug, that should never happen. Every push with the same metric name must have the same label names (this is required by Prometheus afaik), but the upstream pattern is to use something like a job label to seperate individual pushes