Closed beorn7 closed 1 year ago
OM 2.x doesn't exist yet in any form, so I think any experiment should be done under a different name and content type for now to avoid any potential future confusion. We already have enough people thinking OM and Prometheus text format are the same thing.
From an OM 1.x standpoint, as long as it can always gracefully negotiate and degrade to OM 1.0 then it's still compliant with OM. Which is to say produce at least a +Inf bucket and any other buckets be essentially static.
In terms of the minute of the format itself I do have some thoughts, though with 2.x we can be less constrained than for 1.x considering that a 2.x implementation would still have to be able to produce a degraded 1.0. So for example that your proposal requires parsers noticing that "foo" is associated with the TYPE just above is reasonable here, whereas it isn't for 1.0. Without the double quotes would be my main thought, and if you want a JSON parser to be able to handle it ensure that you have a plan for NaN/Inf.
WRT content type: Yes, sure, there should be a very specific content type just for the experiment.
Without the double quotes would be my main thought,
To clarify: If we want to use a JSON parser for the experiment, we need the double quotes during the experiment.
ensure that you have a plan for NaN/Inf.
Ah right. My thought here was that, for the experiment, we require instrumentation libraries to never emit NaN/Inf. That's anyway a weird corner case. We need to handle it for the real thing, of course. But for the real thing, we won't use a JSON snippet in the first place.
To clarify: If we want to use a JSON parser for the experiment, we need the double quotes during the experiment.
The whole point is to experiment, so that sounds fine to me anyway.
/cc @fstab Would this match your expectation for an experiment in client_java?
FYI: @fstab has now added protobuf support to client_java temporarily. One of the reasons for this experiment (to add native histogram support to client_java in a simple way) is therefore not relevant anymore. This might still be useful to play with a text representation of native histograms and how it behaves during generation and parsing etc.
Also note #256 for a draft of Native Histogram support in the OpenMetrics protobuf format.
Given the reaction to a brainstorming doc, I think we should not pursuit this "embedded JSON" idea any longer (but we can, of course, change our minds again). Of all the ideas discussed, "embedded JSON" (idea 1 in the doc) was the least liked, notably also by @csmarchbanks, who maintains client_python, which will probably be the first instrumentation library to implement a text format for native histograms.
Therefore, I'm retracting this proposal (for now).
Prometheus's new Native Histograms AKA Sparse Histograms somehow need to be represented in OM 2.x.
I have described the various trade-offs here. As an outsider, it's hard for me to sketch out a concrete design how to deal with all those, but I would like to propose an experiment: Let's create a makeshift way of including a Native Histogram representation in OpenMetrics that is very easy to generate and parse but ignores, for now, efficiency concerns, OM design philosophies etc. It will, however, allow instrumentation libraries to expose Native Histograms in an experimental way and Prometheus to ingest those. We can then study exposition and ingestion in practice, iterate on it, and get a better idea about the trade-offs for the actual specification of Native Histograms in OM 2.x.
This experiment should be hidden behind a feature flag or even in a separate branch, depending on the release philosophy of the affected repository.
Here's the idea:
histogram.Histogram
type in prometheus/prometheus.histogram.Histogram
type as above.Example for the exposition of a (fairly complex) pure Native Histogram including a timestamp:
Note that there is no name collision with any of the conventional Histogram fields. Therefore, a conventional and a Native Histogram representation can be exposed side by side:
Following thoughts:
0.0008955117609420616
, all buckets on one row without repeating labels for each bucket.WRT a human-readable representation, you might want to have a look at the String method of the histogram.Histogram type. For the 2nd example above, the string representation is
{count:12, sum:123.4, [-0.001,0.001]:2, (0.5,1]:2, (1,2]:3, (2,4]:1, (4,8]:4}
. This looks very benign, but it's also a very simplistic example with only a few buckets and very simple bucket boundaries. For reference, I paste a more typical histogram below. In addition to the verbosity, Prometheus has to "guess" a schema from the representation, sort all the buckets into it, generate the span descriptions, and calculate the deltas between buckets. In total, that is quite a decoding effort.Here the string representation of a "normal" Native Histogram: