prometheus / client_golang

Prometheus instrumentation library for Go applications
https://pkg.go.dev/github.com/prometheus/client_golang
Apache License 2.0
5.4k stars 1.18k forks source link

Optionally return error if two metrics would collide under escaping schemes #1638

Open ywwg opened 2 weeks ago

ywwg commented 2 weeks ago

Hypothetically, if a system defined two metrics that are different in UTF-8 but the same when escaped, the escaped text exposition would show two entries for the "same" metric. (In general users shouldn't do this, but you know what they say about users.) We should help the user avoid doing that.

At registration time, we should check for not only exact duplicates but also escaped duplicates. Since the default UnderscoreEscaping scheme is the most collision-prone, we will always use that method for testing equivalency. If a collision is found, return an error by default. We can add an optional way to skip this check in case it is intended.

Why not always use this method of detection? If users create metrics in, say, Japanese, many metrics would escape to "____" and collide. So this needs to be optional.

bwplotka commented 5 days ago

Hypothetically, if a system defined two metrics that are different in UTF-8 but the same when escaped, the escaped text exposition would show two entries for the "same" metric. (In general users shouldn't do this, but you know what they say about users.) We should help the user avoid doing that.

Can agree, if that do not impact users which do not use advance characters.

At registration time, we should check for not only exact duplicates but also escaped duplicates. Since the default UnderscoreEscaping scheme is the most collision-prone, we will always use that method for testing equivalency. If a collision is found, return an error by default. We can add an optional way to skip this check in case it is intended.

Why not ONLY check for escaped duplicates? The non escaped collision will be caught when escaped, no?

Also can I double check UTF-8 behaviour here: The escaped mechanism is only on scrape time, right? Like client would never escape for you, it's a scrape logic only, right?

Why not always use this method of detection? If users create metrics in, say, Japanese, many metrics would escape to "____" and collide. So this needs to be optional.

Sorry I don't get it. What would collide in Japanese exactly? If any Japanese metrics/labels will cause collision it will cause scrape to literally ignore rest and take first one OR fail scrape anyway, no?

bwplotka commented 5 days ago

I also don't get https://github.com/prometheus/client_golang/pull/1641 goal. I thought this issue is to fail on registration, which is generally best-effort (e.g. there are cases of unchecked collectors etc) and that's fine.

However this PR is to fail /metrics call on this? Isn't that... wrong? I would rather have duplicate than your application fail to give any metric? Do we have a precedense? Do we fail handling /metrics in such collisions.. on scrape?

ArthurSens commented 5 days ago

I also don't get #1641 goal. I thought this issue is to fail on registration, which is generally best-effort (e.g. there are cases of unchecked collectors etc) and that's

This is on me, I've added feedback asking to not to this on registration because, as you said, this is a scrape problem and I wanted to move the solution closer to the scrape.

I would rather have duplicate than your application fail to give any metric?

I had an understanding the we want to provide a scrape as a transaction. So we provide everything that was registered or we provide nothing, there's no partial result here.

Do we fail handling /metrics in such collisions.. on scrape?

I believe we started failing since Prometheus 2.52

ywwg commented 5 days ago

The escaped mechanism is only on scrape time, right? Like client would never escape for you, it's a scrape logic only, right?

yes, this only applies to scraping. I don't believe there is any content negotiation for remote write so it doesn't apply there.

Why not ONLY check for escaped duplicates? The non escaped collision will be caught when escaped, no?

I'm not sure what you mean... non-escaped collisions are caught at registration time in the existing code. But furthermore, the actual looking for collisions happens at registration time to avoid the CPU hit of checking for collisions on every scrape.

What would collide in Japanese exactly?

If the metric names are in Japanese characters, all the characters will be converted to underscores. Any two metrics of the same length of characters would collide because they will be all underscores. This applies to any non-english character set.

I also think maybe we should go back to failing at registration time since that's when the other collisions are caught. The trouble is that creates error messages for people who don't care about these collisions. This is an edge case and I'm not sure how often this will happen. One option is we could only check for these collisions when an option is turned on, and mention that option in the documentation for users setting up a system where there is a mix of UTF-8 and legacy.

At the end of the day maybe we should have skipped escaping and just refused to serve UTF-8 metrics to legacy systems :/.

bwplotka commented 5 days ago

Thanks, makes sense!

I also think maybe we should go back to failing at registration time since that's when the other collisions are caught. The trouble is that creates error messages for people who don't care about these collisions.

Yea, sounds like some option to registry or considering this only for PedenaticRegistry or something.

At the end of the day maybe we should have skipped escaping and just refused to serve UTF-8 metrics to legacy systems :/.

Yea, I would love to check the impact of not doing this collision work on escaping, e.g. who will actually use escaping mechanism.

ywwg commented 5 days ago

Yea, sounds like some option to registry or considering this only for PedenaticRegistry or something.

I'd like to know more about the pedanticregistry and where it gets used.