Open ywwg opened 7 months ago
The path for pushing metrics into the Pushgateway looks like
/metrics/job/<JOB_NAME>{/<LABEL_NAME>/<LABEL_VALUE>}
Currently, job names and label values can be encoded with base64url, in which case job
or the label name must be suffixed with @base64
. For example, using the grouping key job="example",first_label="foo",second_label="bar"
:
/metrics/job@base64/ZXhhbXBsZQ==/first_label@base64/Zm9v/second_label@base64/YmFy
Some ideas I considered for encoding UTF-8 label names in URLs, using the same base64url approach:
@base64name
) so that label names are decoded accordingly when the URL is parsed. We can have another suffix for the case where both the label name and the label value are base64url encoded, and keep the original @base64
for when only the value is encoded. For example, using the grouping key job="example",first.label="foo"
:/metrics/job/example/Zmlyc3QubGFiZWw=@base64name/foo
base64names=true
) to indicate label names are base64url encoded. This can also coexist with the current @base64
suffix used for label values. Disadvantage: it will be all or nothing./metrics/job/example/Zmlyc3QubGFiZWw=/foo?base64names=true
Metric and label names in the request body are validated using functions from the common
library that already support UTF-8.
is there content negotiation with calls to pushgateway, or do people just hit the endpoint? (What if someone sends either of these formats to an older endpoint that doesn't understand the new grammar?)
is there content negotiation with calls to pushgateway, or do people just hit the endpoint? (What if someone sends either of these formats to an older endpoint that doesn't understand the new grammar?)
There's no content negotiation while pushing metrics into the Pushgateway. The URL is parsed as is, splitting the labels with /
as the delimiter, then checking if the name contains the @base64
suffix and, once the suffix is trimmed, checking if the label name itself is valid. So if someone sends either of these formats to an older endpoint, the push would be rejected
So yeah, the base64 trick was introduced prior to 1.0.0. So we never had the situation where a PGW of v1+ would be confronted with a newer version of the push protocol it couldn't understand.
Content negotiation with push is hard, because you need multiple round trips for it.
Ideally, we find a solution that still works for older v1+ PGWs transparently. But I have currently no idea how to do that.
Or we do something that newer pushers have to retry with a legacy version of the names if they get an error back…
Idea: Let's just use the already proposed bespoke text escaping instead of using base64 escaping. A PGW with UTF-8 support enabled will simply unescape.
Pros:
Cons:
U__
and happens to have a valid escape sequences), we would falsely unescape.To not introduce a (technically) breaking change, we can make the UTF-8 support opt-in for now (finally feature flags in PGW! ;).
I think it is fine to look for "U__" and unescape automatically if we see it. We decided in the design that it is unlikely people will name metrics like that, and if they do and the unescaping fails, we just return the name as-is anyway
Yeah, I think the escaping alternative makes more sense. +1 on the opt-in. I'll start looking into it, thanks for the feedback!
Implementation merged. I'll try to cut a new release next week or so.
Maybe we are not quite there yet. UI needs to know about the new syntax, too. While this is still benign: How about this:
As raised in https://github.com/prometheus/client_ruby/issues/306, we will need some way to encode UTF-8 metric and label names in pushgateway URLs. This may be tricky because the current method does allow for encodings of job names and label values, and any solution would have to be backwards compatible with the current syntax. Thoughtful design work is needed.
See: https://github.com/prometheus/pushgateway#url
Feature request
Use case. Why is this important?
UTF-8 Support is coming to prometheus and we need to cover this usecase.