Open lstrojny opened 1 year ago
I expect this to be due to the use of HashMap
in Family
. It could be an option to use a different, ordered datastructure.
Related discussion https://github.com/prometheus/client_golang/issues/483.
I would expect the metrics to be somehow sorted deterministically.
While I understand the value of sorted output for the sake of debugging, I am not sure it is worth than the (expected) performance hit.
That makes sense. My use case here is unit testing where I need to assert on a stable output, so I could even make a potential sorting flag dependent on cfg(test). How about adding optional sorting?
How about adding optional sorting?
Have to give this more thought, though not opposed to it. Might as well always do it in debug mode.
Again, might be worth exploring replacing the HashMap
and thus getting sorting for free.
The text encoding of metrics is defined by the spec, and the spec doesn't define any sort order, so callers can't assume any sort order.
I was looking into this for the same reasons as @lstrojny, and found this thread on the rust-lang
board:
https://users.rust-lang.org/t/hashmap-vs-btreemap/13804
checked just now and BTreeMap is significantly faster than HashMap for many of the benchmarks in our
json-benchmark
. (In the table lower is better.)
Benchmark BTreeMap HashMap parse canada.json 10.9ms 11.9ms stringify canada.json 11.3ms 11.6ms parse citm_catalog.json 5.5ms 11.2ms stringify citm_catalog.json 1.4ms 3.3ms parse twitter.json 2.5ms 2.9ms stringify twitter.json 0.6ms 0.9ms
It seems like in addition to getting sorting for free, we may be likely to actually see a performance improvement for many cases, as the serde-json
maintainer notes there.
As @lstrojny notes above, stable output is often very helpful for ensuring deterministic results in unit tests. If the existing hash map behavior needed to be preserved, a similar approach that internally abstracts over the two kinds of maps could offer the benefits of both.
The issue linked above https://github.com/prometheus/client_golang/issues/483 also seems to indicate that the Go client also sorts metrics, with the open question there being how to optionally disable sorting for families with an especially large exposition.
I'm sympathetic to @08d2's point above, but I see only upside, without any real downside. If we stand to (a) improve performance, (b) improve testing ergonomics, and (c) mirror the behavior of other languages' client libraries by moving to a BTreeMap
, does #128 seem like a reasonable change to you all?
The output or
encode()
is non-deterministic when creating multiple metrics of the same type.So basically:
This will sometimes return this:
And sometimes this:
I would expect the metrics to be somehow sorted deterministically.