Proposal for an experiment to include native histograms in OpenMetrics

beorn7 commented 2 years ago

Prometheus's new Native Histograms AKA Sparse Histograms somehow need to be represented in OM 2.x.

I have described the various trade-offs here. As an outsider, it's hard for me to sketch out a concrete design how to deal with all those, but I would like to propose an experiment: Let's create a makeshift way of including a Native Histogram representation in OpenMetrics that is very easy to generate and parse but ignores, for now, efficiency concerns, OM design philosophies etc. It will, however, allow instrumentation libraries to expose Native Histograms in an experimental way and Prometheus to ingest those. We can then study exposition and ingestion in practice, iterate on it, and get a better idea about the trade-offs for the actual specification of Native Histograms in OM 2.x.

This experiment should be hidden behind a feature flag or even in a separate branch, depending on the release philosophy of the affected repository.

Here's the idea:

Add JSON tags to the histogram.Histogram type in prometheus/prometheus.
Instead of the floating point number for a regular sample, use a one-line JSON snippet for a Native Histogram sample, that marshals into the histogram.Histogram type as above.

Example for the exposition of a (fairly complex) pure Native Histogram including a timestamp:

# TYPE foo histogram
foo {"schema":0,"zero_threshold":0.001,"zero_count":4,"count":24,"sum":100,"positive_spans":[{"offset":0,"length":2},{"offset":1,"length":2}],"negative_spans":[{"offset":0,"length":2},{"offset":1,"length":2}],"positive_buckets":[2,1,-2,3],"negative_buckets":[2,1,-2,3]} 1520430042.123
foo_created 1520430000.123

Note that there is no name collision with any of the conventional Histogram fields. Therefore, a conventional and a Native Histogram representation can be exposed side by side:

# TYPE foo histogram
foo {"schema":0,"zero_threshold":0.001,"zero_count":2,"count":12,"sum":123.4,"positive_spans":[{"offset":0,"length":2},{"offset":1,"length":2}],"positive_buckets":[2,1,-2,3]}
foo_bucket{le="0.0001"} 2
foo_bucket{le="1.0"} 2
foo_bucket{le="2.0"} 5
foo_bucket{le="4.0"} 6
foo_bucket{le="8.0"} 10
foo_bucket{le="+Inf"} 12
foo_count 12
foo_sum 123.4
foo_created 1520430000.123

Following thoughts:

The JSON snippet makes it easy to quickly create generators and parsers.
To test ideal text parsing performance, a hand-coded highly-optimized parser can still be written.
The layout corresponds to what the Prometheus server has to create when ingesting a Native Histogram into TSDB. Therefore, this experiment will illustrate the "best case" (from the server's perspective) of doing minimal work for decoding.
The format is also fairly compact, avoiding spelling out bucket boundaries explicitly (which are, in the general case, very long floating point numbers, e.g. 0.0008955117609420616, all buckets on one row without repeating labels for each bucket.
With a bit of squinting (mostly removing the double quotes), the format is close to how a "machine-friendly" OM format could actually look like. It is, of course, not very "human-friendly", but a human-readable format would be both much more verbose and much more expensive to decode. See the trade-offs mentioned earlier.
While this implicitly also covers the case of a Gauge Histogram, it does not work for a Float Histogram (which looks significantly different Prometheus-internally). I think that's fine for this experiment. Exposing Float Histograms is a very rare use case (currently not even covered by OM for conventional Histograms).

WRT a human-readable representation, you might want to have a look at the String method of the histogram.Histogram type. For the 2nd example above, the string representation is {count:12, sum:123.4, [-0.001,0.001]:2, (0.5,1]:2, (1,2]:3, (2,4]:1, (4,8]:4}. This looks very benign, but it's also a very simplistic example with only a few buckets and very simple bucket boundaries. For reference, I paste a more typical histogram below. In addition to the verbosity, Prometheus has to "guess" a schema from the representation, sort all the buckets into it, generate the span descriptions, and calculate the deltas between buckets. In total, that is quite a decoding effort.

Here the string representation of a "normal" Native Histogram:

{ count:252719 sum:2.588417777086236 [-0.0011613350732448448,-0.0010649489576809157):1 [-0.0008211879055212055,-0.0007530326295937211):9 [-0.0007530326295937211,-0.0006905339660024878):50 [-0.0006905339660024878,-0.0006332224387944383):121 [-0.0006332224387944383,-0.0005806675366224224):213 [-0.0005806675366224224,-0.0005324744788404579):392 [-0.0005324744788404579,-0.00048828125):727 [-0.00048828125,-0.0004477558804710308):1171 [-0.0004477558804710308,-0.00041059395276060273):1672 [-0.00041059395276060273,-0.00037651631479686053):2193 [-0.00037651631479686053,-0.0003452669830012439):2903 [-0.0003452669830012439,-0.00031661121939721915):3469 [-0.00031661121939721915,-0.0002903337683112112):3785 [-0.0002903337683112112,-0.00026623723942022893):4245 [-0.00026623723942022893,-0.000244140625):4615 [-0.000244140625,-0.0002238779402355154):4920 [-0.0002238779402355154,-0.00020529697638030136):4996 [-0.00020529697638030136,-0.00018825815739843027):5091 [-0.00018825815739843027,-0.00017263349150062194):5122 [-0.00017263349150062194,-0.00015830560969860958):4900 [-0.00015830560969860958,-0.0001451668841556056):4876 [-0.0001451668841556056,-0.00013311861971011446):4707 [-0.00013311861971011446,-0.0001220703125):4435 [-0.0001220703125,-0.0001119389701177577):4087 [-0.0001119389701177577,-0.00010264848819015068):3994 [-0.00010264848819015068,-0.00009412907869921513):3730 [-0.00009412907869921513,-0.00008631674575031097):3425 [-0.00008631674575031097,-0.00007915280484930479):3157 [-0.00007915280484930479,-0.0000725834420778028):2944 [-0.0000725834420778028,-0.00006655930985505723):2782 [-0.00006655930985505723,-0.00006103515625):2536 [-0.00006103515625,-0.00005596948505887885):2440 [-0.00005596948505887885,-0.00005132424409507534):2224 [-0.00005132424409507534,-0.00004706453934960757):2042 [-0.00004706453934960757,-0.000043158372875155485):1948 [-0.000043158372875155485,-0.000039576402424652394):1759 [-0.000039576402424652394,-0.0000362917210389014):1628 [-0.0000362917210389014,-0.000033279654927528616):1439 [-0.000033279654927528616,-0.000030517578125):1282 [-0.000030517578125,-0.000027984742529439426):1184 [-0.000027984742529439426,-0.00002566212204753767):1185 [-0.00002566212204753767,-0.000023532269674803783):1064 [-0.000023532269674803783,-0.000021579186437577742):966 [-0.000021579186437577742,-0.000019788201212326197):889 [-0.000019788201212326197,-0.0000181458605194507):788 [-0.0000181458605194507,-0.000016639827463764308):732 [-0.000016639827463764308,-0.0000152587890625):638 [-0.0000152587890625,-0.000013992371264719713):602 [-0.000013992371264719713,-0.000012831061023768835):587 [-0.000012831061023768835,-0.000011766134837401892):534 [-0.000011766134837401892,-0.000010789593218788871):494 [-0.000010789593218788871,-0.000009894100606163098):420 [-0.000009894100606163098,-0.00000907293025972535):398 [-0.00000907293025972535,-0.000008319913731882154):359 [-0.000008319913731882154,-0.00000762939453125):332 [-0.00000762939453125,-0.0000069961856323598564):338 [-0.0000069961856323598564,-0.000006415530511884418):280 [-0.000006415530511884418,-0.000005883067418700946):255 [-0.000005883067418700946,-0.000005394796609394436):232 [-0.000005394796609394436,-0.000004947050303081549):228 [-0.000004947050303081549,-0.000004536465129862675):208 [-0.000004536465129862675,-0.000004159956865941077):179 [-0.000004159956865941077,-0.000003814697265625):166 [-0.000003814697265625,-0.0000034980928161799282):177 [-0.0000034980928161799282,-0.000003207765255942209):138 [-0.000003207765255942209,-0.000002941533709350473):149 [-0.000002941533709350473,-0.000002697398304697218):99 [-0.000002697398304697218,-0.0000024735251515407746):101 [-0.0000024735251515407746,-0.0000022682325649313374):96 [-0.0000022682325649313374,-0.0000020799784329705385):101 [-0.0000020799784329705385,-0.0000019073486328125):94 [-0.0000019073486328125,-0.0000017490464080899641):74 [-0.0000017490464080899641,-0.0000016038826279711044):79 [-0.0000016038826279711044,-0.0000014707668546752365):62 [-0.0000014707668546752365,-0.000001348699152348609):66 [-0.000001348699152348609,-0.0000012367625757703873):73 [-0.0000012367625757703873,-0.0000011341162824656687):53 [-0.0000011341162824656687,-0.0000010399892164852693):50 [-0.0000010399892164852693,-9.5367431640625e-07):37 [-9.5367431640625e-07,-8.745232040449821e-07):47 [-8.745232040449821e-07,-8.019413139855522e-07):32 [-8.019413139855522e-07,-7.353834273376182e-07):30 [-7.353834273376182e-07,-6.743495761743044e-07):28 [-6.743495761743044e-07,-6.183812878851937e-07):25 [-6.183812878851937e-07,-5.670581412328344e-07):25 [-5.670581412328344e-07,-5.199946082426346e-07):29 [-5.199946082426346e-07,-4.76837158203125e-07):18 [-4.76837158203125e-07,-4.3726160202249103e-07):15 [-4.3726160202249103e-07,-4.009706569927761e-07):17 [-4.009706569927761e-07,-3.676917136688091e-07):20 [-3.676917136688091e-07,-3.371747880871522e-07):15 [-3.371747880871522e-07,-3.0919064394259683e-07):17 [-3.0919064394259683e-07,-2.835290706164172e-07):15 [-2.835290706164172e-07,-2.599973041213173e-07):9 [-2.599973041213173e-07,-2.384185791015625e-07):12 [-2.384185791015625e-07,-2.1863080101124551e-07):15 [-2.1863080101124551e-07,-2.0048532849638805e-07):9 [-2.0048532849638805e-07,-1.8384585683440456e-07):4 [-1.8384585683440456e-07,-1.685873940435761e-07):8 [-1.685873940435761e-07,-1.5459532197129841e-07):8 [-1.5459532197129841e-07,-1.417645353082086e-07):2 [-1.417645353082086e-07,-1.2999865206065866e-07):4 [-1.1920928955078125e-07,-1.0931540050562276e-07):2 [-1.0024266424819403e-07,-9.192292841720228e-08):3 [-9.192292841720228e-08,-8.429369702178806e-08):3 [-8.429369702178806e-08,-7.729766098564921e-08):2 [-7.08822676541043e-08,-6.499932603032933e-08):4 [-6.499932603032933e-08,-5.960464477539063e-08):1 [-5.960464477539063e-08,-5.465770025281138e-08):3 [-5.465770025281138e-08,-5.012133212409701e-08):4 [-5.012133212409701e-08,-4.596146420860114e-08):1 [-4.596146420860114e-08,-4.214684851089403e-08):2 [-4.214684851089403e-08,-3.8648830492824603e-08):1 [-3.8648830492824603e-08,-3.544113382705215e-08):1 [-3.544113382705215e-08,-3.2499663015164664e-08):1 [-3.2499663015164664e-08,-2.9802322387695312e-08):2 [-2.732885012640569e-08,-2.5060666062048506e-08):2 [-1.9324415246412302e-08,-1.7720566913526073e-08):2 [-1.6249831507582332e-08,-1.4901161193847656e-08):1 [-1.4901161193847656e-08,-1.3664425063202845e-08):1 [-8.860283456763037e-09,-8.124915753791166e-09):1 [-7.450580596923828e-09,-6.832212531601422e-09):1 [-6.265166515512127e-09,-5.7451830260751424e-09):1 [-5.7451830260751424e-09,-5.2683560638617535e-09):1 [-5.2683560638617535e-09,-4.8311038116030754e-09):2 [-2.215070864190759e-09,-2.0312289384477915e-09):1 [-1.4362957565187856e-09,-1.3170890159654384e-09):1 (1.7080531329003556e-09,1.862645149230957e-09]:1 (2.6341780319308768e-09,2.8725915130375712e-09]:2 (5.7451830260751424e-09,6.265166515512127e-09]:1 (6.265166515512127e-09,6.832212531601422e-09]:1 (1.1490366052150285e-08,1.2530333031024253e-08]:1 (1.3664425063202845e-08,1.4901161193847656e-08]:1 (1.6249831507582332e-08,1.7720566913526073e-08]:1 (1.7720566913526073e-08,1.9324415246412302e-08]:2 (1.9324415246412302e-08,2.1073424255447014e-08]:3 (2.9802322387695312e-08,3.2499663015164664e-08]:1 (3.2499663015164664e-08,3.544113382705215e-08]:3 (3.544113382705215e-08,3.8648830492824603e-08]:4 (3.8648830492824603e-08,4.214684851089403e-08]:2 (4.596146420860114e-08,5.012133212409701e-08]:4 (5.012133212409701e-08,5.465770025281138e-08]:1 (5.465770025281138e-08,5.960464477539063e-08]:8 (5.960464477539063e-08,6.499932603032933e-08]:5 (6.499932603032933e-08,7.08822676541043e-08]:3 (7.08822676541043e-08,7.729766098564921e-08]:2 (7.729766098564921e-08,8.429369702178806e-08]:2 (8.429369702178806e-08,9.192292841720228e-08]:2 (9.192292841720228e-08,1.0024266424819403e-07]:4 (1.0024266424819403e-07,1.0931540050562276e-07]:6 (1.0931540050562276e-07,1.1920928955078125e-07]:8 (1.1920928955078125e-07,1.2999865206065866e-07]:9 (1.2999865206065866e-07,1.417645353082086e-07]:6 (1.417645353082086e-07,1.5459532197129841e-07]:10 (1.5459532197129841e-07,1.685873940435761e-07]:10 (1.685873940435761e-07,1.8384585683440456e-07]:7 (1.8384585683440456e-07,2.0048532849638805e-07]:8 (2.0048532849638805e-07,2.1863080101124551e-07]:7 (2.1863080101124551e-07,2.384185791015625e-07]:14 (2.384185791015625e-07,2.599973041213173e-07]:9 (2.599973041213173e-07,2.835290706164172e-07]:14 (2.835290706164172e-07,3.0919064394259683e-07]:9 (3.0919064394259683e-07,3.371747880871522e-07]:10 (3.371747880871522e-07,3.676917136688091e-07]:11 (3.676917136688091e-07,4.009706569927761e-07]:21 (4.009706569927761e-07,4.3726160202249103e-07]:18 (4.3726160202249103e-07,4.76837158203125e-07]:22 (4.76837158203125e-07,5.199946082426346e-07]:24 (5.199946082426346e-07,5.670581412328344e-07]:23 (5.670581412328344e-07,6.183812878851937e-07]:20 (6.183812878851937e-07,6.743495761743044e-07]:22 (6.743495761743044e-07,7.353834273376182e-07]:26 (7.353834273376182e-07,8.019413139855522e-07]:30 (8.019413139855522e-07,8.745232040449821e-07]:34 (8.745232040449821e-07,9.5367431640625e-07]:35 (9.5367431640625e-07,0.0000010399892164852693]:35 (0.0000010399892164852693,0.0000011341162824656687]:66 (0.0000011341162824656687,0.0000012367625757703873]:51 (0.0000012367625757703873,0.000001348699152348609]:60 (0.000001348699152348609,0.0000014707668546752365]:54 (0.0000014707668546752365,0.0000016038826279711044]:74 (0.0000016038826279711044,0.0000017490464080899641]:67 (0.0000017490464080899641,0.0000019073486328125]:74 (0.0000019073486328125,0.0000020799784329705385]:74 (0.0000020799784329705385,0.0000022682325649313374]:87 (0.0000022682325649313374,0.0000024735251515407746]:110 (0.0000024735251515407746,0.000002697398304697218]:107 (0.000002697398304697218,0.000002941533709350473]:113 (0.000002941533709350473,0.000003207765255942209]:133 (0.000003207765255942209,0.0000034980928161799282]:144 (0.0000034980928161799282,0.000003814697265625]:176 (0.000003814697265625,0.000004159956865941077]:195 (0.000004159956865941077,0.000004536465129862675]:189 (0.000004536465129862675,0.000004947050303081549]:206 (0.000004947050303081549,0.000005394796609394436]:221 (0.000005394796609394436,0.000005883067418700946]:234 (0.000005883067418700946,0.000006415530511884418]:307 (0.000006415530511884418,0.0000069961856323598564]:266 (0.0000069961856323598564,0.00000762939453125]:309 (0.00000762939453125,0.000008319913731882154]:341 (0.000008319913731882154,0.00000907293025972535]:373 (0.00000907293025972535,0.000009894100606163098]:408 (0.000009894100606163098,0.000010789593218788871]:450 (0.000010789593218788871,0.000011766134837401892]:512 (0.000011766134837401892,0.000012831061023768835]:569 (0.000012831061023768835,0.000013992371264719713]:564 (0.000013992371264719713,0.0000152587890625]:649 (0.0000152587890625,0.000016639827463764308]:692 (0.000016639827463764308,0.0000181458605194507]:729 (0.0000181458605194507,0.000019788201212326197]:847 (0.000019788201212326197,0.000021579186437577742]:890 (0.000021579186437577742,0.000023532269674803783]:942 (0.000023532269674803783,0.00002566212204753767]:1069 (0.00002566212204753767,0.000027984742529439426]:1193 (0.000027984742529439426,0.000030517578125]:1294 (0.000030517578125,0.000033279654927528616]:1334 (0.000033279654927528616,0.0000362917210389014]:1494 (0.0000362917210389014,0.000039576402424652394]:1571 (0.000039576402424652394,0.000043158372875155485]:1817 (0.000043158372875155485,0.00004706453934960757]:1969 (0.00004706453934960757,0.00005132424409507534]:2043 (0.00005132424409507534,0.00005596948505887885]:2286 (0.00005596948505887885,0.00006103515625]:2459 (0.00006103515625,0.00006655930985505723]:2670 (0.00006655930985505723,0.0000725834420778028]:2950 (0.0000725834420778028,0.00007915280484930479]:3145 (0.00007915280484930479,0.00008631674575031097]:3372 (0.00008631674575031097,0.00009412907869921513]:3538 (0.00009412907869921513,0.00010264848819015068]:3968 (0.00010264848819015068,0.0001119389701177577]:4280 (0.0001119389701177577,0.0001220703125]:4478 (0.0001220703125,0.00013311861971011446]:4828 (0.00013311861971011446,0.0001451668841556056]:5014 (0.0001451668841556056,0.00015830560969860958]:5149 (0.00015830560969860958,0.00017263349150062194]:5350 (0.00017263349150062194,0.00018825815739843027]:5540 (0.00018825815739843027,0.00020529697638030136]:5457 (0.00020529697638030136,0.0002238779402355154]:5503 (0.0002238779402355154,0.000244140625]:5523 (0.000244140625,0.00026623723942022893]:5350 (0.00026623723942022893,0.0002903337683112112]:5075 (0.0002903337683112112,0.00031661121939721915]:4607 (0.00031661121939721915,0.0003452669830012439]:4015 (0.0003452669830012439,0.00037651631479686053]:3309 (0.00037651631479686053,0.00041059395276060273]:2770 (0.00041059395276060273,0.0004477558804710308]:1996 (0.0004477558804710308,0.00048828125]:1457 (0.00048828125,0.0005324744788404579]:989 (0.0005324744788404579,0.0005806675366224224]:577 (0.0005806675366224224,0.0006332224387944383]:298 (0.0006332224387944383,0.0006905339660024878]:132 (0.0006905339660024878,0.0007530326295937211]:53 (0.0007530326295937211,0.0008211879055212055]:20 (0.0008211879055212055,0.0008955117609420616]:7 }

brian-brazil commented 2 years ago

OM 2.x doesn't exist yet in any form, so I think any experiment should be done under a different name and content type for now to avoid any potential future confusion. We already have enough people thinking OM and Prometheus text format are the same thing.

From an OM 1.x standpoint, as long as it can always gracefully negotiate and degrade to OM 1.0 then it's still compliant with OM. Which is to say produce at least a +Inf bucket and any other buckets be essentially static.

In terms of the minute of the format itself I do have some thoughts, though with 2.x we can be less constrained than for 1.x considering that a 2.x implementation would still have to be able to produce a degraded 1.0. So for example that your proposal requires parsers noticing that "foo" is associated with the TYPE just above is reasonable here, whereas it isn't for 1.0. Without the double quotes would be my main thought, and if you want a JSON parser to be able to handle it ensure that you have a plan for NaN/Inf.

beorn7 commented 2 years ago

WRT content type: Yes, sure, there should be a very specific content type just for the experiment.

Without the double quotes would be my main thought,

To clarify: If we want to use a JSON parser for the experiment, we need the double quotes during the experiment.

ensure that you have a plan for NaN/Inf.

Ah right. My thought here was that, for the experiment, we require instrumentation libraries to never emit NaN/Inf. That's anyway a weird corner case. We need to handle it for the real thing, of course. But for the real thing, we won't use a JSON snippet in the first place.

brian-brazil commented 2 years ago

To clarify: If we want to use a JSON parser for the experiment, we need the double quotes during the experiment.

The whole point is to experiment, so that sounds fine to me anyway.

beorn7 commented 2 years ago

/cc @fstab Would this match your expectation for an experiment in client_java?

beorn7 commented 2 years ago

FYI: @fstab has now added protobuf support to client_java temporarily. One of the reasons for this experiment (to add native histogram support to client_java in a simple way) is therefore not relevant anymore. This might still be useful to play with a text representation of native histograms and how it behaves during generation and parsing etc.

beorn7 commented 2 years ago

Also note #256 for a draft of Native Histogram support in the OpenMetrics protobuf format.

beorn7 commented 1 year ago

Given the reaction to a brainstorming doc, I think we should not pursuit this "embedded JSON" idea any longer (but we can, of course, change our minds again). Of all the ideas discussed, "embedded JSON" (idea 1 in the doc) was the least liked, notably also by @csmarchbanks, who maintains client_python, which will probably be the first instrumentation library to implement a text format for native histograms.

Therefore, I'm retracting this proposal (for now).

prometheus / OpenMetrics

Proposal for an experiment to include native histograms in OpenMetrics #247