prometheus / client_java

Prometheus instrumentation library for JVM applications
http://prometheus.github.io/client_java/
Apache License 2.0
2.18k stars 798 forks source link

Implement UTF8 Support #916

Open ywwg opened 9 months ago

ywwg commented 9 months ago

Part of https://github.com/prometheus/prometheus/issues/13095, all client libraries will need to support the new scraping, query, and content negotiation formats.

ywwg commented 9 months ago

@fedetorres93

fedetorres93 commented 9 months ago

I'll start by implementing UTF-8 support in the Java client library

fstab commented 9 months ago

@fedetorres93 thanks for volunteering, I really appreciate that!

Is there any general guidance yet on how to implement it, for example how to convert UTF-8 names to Prometheus names for older Prometheus servers, and how to deal with potential name collisions when registering metrics?

It would be good to define the behavior first before implementing it. Ideally the behavior would be consistent across client libraries in all programming languages.

fedetorres93 commented 9 months ago

@fstab You can find the proposals @ywwg worked on here and here.

I'm working on adding UTF-8 metric and label name validations and support for parsing and formatting the UTF-8 text format, but there's still some discussion going on about the content negotiation implementation on writes and also regarding how the reads will be handled

fstab commented 9 months ago

Thanks @fedetorres93!

There is already support for dots in metric and label names in client_java. It will be easy to extend this to other characters. The motivation for allowing dots was to support metric/label names defined in the OpenTelemetry semantic conventions.

Currently dots are only exposed in OpenTelemetry format. In Prometheus text format, OpenMetrics text format, and OpenMetrics protobuf format dots are replaced with underscores.

I assume for UTF-8 characters in Prometheus format we will define a new OpenMetrics version, right?

I think the following two considerations make sense:

What do you think? If you feel we should have a small "client library support for UTF-8" proposal with the points above I'm happy to write one.

fedetorres93 commented 9 months ago

Thanks for the info @fstab!

I don't think another proposal is necessary, but I appreciate the points you mentioned and will take them into account for the implementation.