prometheus / pushgateway

Push acceptor for ephemeral and batch jobs.
Apache License 2.0
3.01k stars 467 forks source link

Support UTF-8 in metric and label names #623

Open ywwg opened 7 months ago

ywwg commented 7 months ago

As raised in https://github.com/prometheus/client_ruby/issues/306, we will need some way to encode UTF-8 metric and label names in pushgateway URLs. This may be tricky because the current method does allow for encodings of job names and label values, and any solution would have to be backwards compatible with the current syntax. Thoughtful design work is needed.

See: https://github.com/prometheus/pushgateway#url

Feature request

Use case. Why is this important?

UTF-8 Support is coming to prometheus and we need to cover this usecase.

fedetorres93 commented 2 months ago

The path for pushing metrics into the Pushgateway looks like

/metrics/job/<JOB_NAME>{/<LABEL_NAME>/<LABEL_VALUE>}

Currently, job names and label values can be encoded with base64url, in which case job or the label name must be suffixed with @base64. For example, using the grouping key job="example",first_label="foo",second_label="bar":

/metrics/job@base64/ZXhhbXBsZQ==/first_label@base64/Zm9v/second_label@base64/YmFy

Some ideas I considered for encoding UTF-8 label names in URLs, using the same base64url approach:

/metrics/job/example/Zmlyc3QubGFiZWw=@base64name/foo
/metrics/job/example/Zmlyc3QubGFiZWw=/foo?base64names=true

Metric and label names in the request body are validated using functions from the common library that already support UTF-8.

ywwg commented 2 months ago

is there content negotiation with calls to pushgateway, or do people just hit the endpoint? (What if someone sends either of these formats to an older endpoint that doesn't understand the new grammar?)

fedetorres93 commented 1 month ago

is there content negotiation with calls to pushgateway, or do people just hit the endpoint? (What if someone sends either of these formats to an older endpoint that doesn't understand the new grammar?)

There's no content negotiation while pushing metrics into the Pushgateway. The URL is parsed as is, splitting the labels with / as the delimiter, then checking if the name contains the @base64 suffix and, once the suffix is trimmed, checking if the label name itself is valid. So if someone sends either of these formats to an older endpoint, the push would be rejected

beorn7 commented 1 month ago

So yeah, the base64 trick was introduced prior to 1.0.0. So we never had the situation where a PGW of v1+ would be confronted with a newer version of the push protocol it couldn't understand.

Content negotiation with push is hard, because you need multiple round trips for it.

Ideally, we find a solution that still works for older v1+ PGWs transparently. But I have currently no idea how to do that.

Or we do something that newer pushers have to retry with a legacy version of the names if they get an error back…

beorn7 commented 1 month ago

Idea: Let's just use the already proposed bespoke text escaping instead of using base64 escaping. A PGW with UTF-8 support enabled will simply unescape.

Pros:

Cons:

beorn7 commented 1 month ago

To not introduce a (technically) breaking change, we can make the UTF-8 support opt-in for now (finally feature flags in PGW! ;).

ywwg commented 1 month ago

I think it is fine to look for "U__" and unescape automatically if we see it. We decided in the design that it is unlikely people will name metrics like that, and if they do and the unescaping fails, we just return the name as-is anyway

fedetorres93 commented 1 month ago

Yeah, I think the escaping alternative makes more sense. +1 on the opt-in. I'll start looking into it, thanks for the feedback!

beorn7 commented 3 weeks ago

Implementation merged. I'll try to cut a new release next week or so.

beorn7 commented 3 weeks ago

Maybe we are not quite there yet. UI needs to know about the new syntax, too. While this is still benign: image How about this: image