perf: improve performance of rendering metrics to Prometheus string

ngavalas commented 1 year ago

Rendering histograms, particularly in getMetricAsPrometheusString, repeats most of the work of label rendering over and over and over and over again on the same strings. This is because we currently fully copy all labels out onto all buckets and then format them all separately as if they weren't the same labelset to begin with.

This PR takes a different approach: 1) The labels we're passed in are stored separately from the ones automatically added by histogram (specifically, the le label, but this would work for other ones too). 2) If we get asked for get, we merge them back in so the API is unchanged. No harm, no foul. 3) The fast path returns the shared labels separately so that getMetricAsPrometheusString can encode and stringify the shared set of original labels exactly once. 4) When rendering the rest of the labels to strings, we check and make sure they don't exist in the shared set first. This is because in the original code, the provided set of labels takes precedence over the "internal" le label. Deferring this check until serialization time is faster even if the code is slightly more complex. 5) We append the shared label set onto the end of the internal label set and bam, it's like nothing ever happened.[1]

Here are the benchmark results:

### Results:

┌────────────────────────────────────────┬────────────────────┬────────────────────┐
│ registry                               │ current            │ prom-client@latest │
├────────────────────────────────────────┼────────────────────┼────────────────────┤
│ getMetricsAsJSON#1 with 64             │ 3747.0556369320916 │ 2000.6075672702098 │
├────────────────────────────────────────┼────────────────────┼────────────────────┤
│ getMetricsAsJSON#2 with 8              │ 3458.670353309119  │ 1674.8824925227625 │
├────────────────────────────────────────┼────────────────────┼────────────────────┤
│ getMetricsAsJSON#2 with 4 and 2 with 2 │ 3660.534565081909  │ 1608.8334436915613 │
├────────────────────────────────────────┼────────────────────┼────────────────────┤
│ getMetricsAsJSON#2 with 2 and 2 with 4 │ 3330.496324702882  │ 452.3735183797067  │
├────────────────────────────────────────┼────────────────────┼────────────────────┤
│ getMetricsAsJSON#6 with 2              │ 3081.6146752509944 │ 1982.9467290709053 │
├────────────────────────────────────────┼────────────────────┼────────────────────┤
│ metrics#1 with 64                      │ 3795.732389650059  │ 3175.098432694792  │
├────────────────────────────────────────┼────────────────────┼────────────────────┤
│ metrics#2 with 8                       │ 4029.138075769385  │ 2882.770072766679  │
├────────────────────────────────────────┼────────────────────┼────────────────────┤
│ metrics#2 with 4 and 2 with 2          │ 4371.559530861944  │ 2486.5945077703427 │
├────────────────────────────────────────┼────────────────────┼────────────────────┤
│ metrics#2 with 2 and 2 with 4          │ 4090.871152384675  │ 2384.3272352940894 │
├────────────────────────────────────────┼────────────────────┼────────────────────┤
│ metrics#6 with 2                       │ 4247.01786082521   │ 2102.320963244116  │
└────────────────────────────────────────┴────────────────────┴────────────────────┘

### Summary:

✓ registry ➭ getMetricsAsJSON#1 with 64 is 87.30% faster.
✓ registry ➭ getMetricsAsJSON#2 with 8 is 106.5% faster.
✓ registry ➭ getMetricsAsJSON#2 with 4 and 2 with 2 is 127.5% faster.
✓ registry ➭ getMetricsAsJSON#2 with 2 and 2 with 4 is 636.2% faster.
✓ registry ➭ getMetricsAsJSON#6 with 2 is 55.41% faster.
✓ registry ➭ metrics#1 with 64 is 19.55% faster.
✓ registry ➭ metrics#2 with 8 is 39.77% faster.
✓ registry ➭ metrics#2 with 4 and 2 with 2 is 75.81% faster.
✓ registry ➭ metrics#2 with 2 and 2 with 4 is 71.57% faster.
✓ registry ➭ metrics#6 with 2 is 102.0% faster.

This should also help #543.

[1] Technically, the order of some labels on histograms changes now depending on where they came from. You can see this in the histogram test case, which I needed to tweak. Label order stability doesn't seem super important, but I can try to fix this if anyone feels really strongly. It'll be messy and JS objects aren't really ordered consistently across older node versions anyway if I remember correctly.

ngavalas commented 1 year ago

I see that we actually ban people from passing in le in histograms, so technically there should be no conflicts in that particular case, but it does seem safer to just keep the extra collision check in registry.js so we don't double apply labels ever.

SimenB commented 1 year ago

Thanks for sending this PR @ngavalas! Could you rebase now next has landed on master?

Really excited about the results of this PR 👍

ngavalas commented 1 year ago

Yup will rebase this morning, thanks.

ngavalas commented 1 year ago

Rebased and tests + benchmark passed, but I'm going to read the merged code carefully to be triple sure.

Two things I can think of after reading this more: 1) I am pretty sure label order doesn't strictly matter. In any case, this change only affects label order in a "breaking" way across the old version and the new version; within a single scrape it should be consistent. 2) What happens if someone passes in a labels object and then mutates it outside of the library? This won't play nicely with my memoization (__flattened_internal) but I want to avoid making an unnecessary copy if things already get weird if people do that, which I suspect is the case

SimenB commented 1 year ago

As long as the spec doesn't say anything specific, I think that's perfectly fine. And even if people do snapshot testing of the string (we do at work), this change will go out as a major release. So I don't think it's an issue (again, assuming it doesn't violate the spec).
I'm inclined to call that a bug in the user code and not something we need to worry about. If your optimization breaks, we can do Object.freeze. If any mutation is just ignored, I think that's fine.

ngavalas commented 1 year ago

It doesn't break, it just ignores any mutation after you render it to the prom string one time. This is like an edge case of an edge case and I don't feel bad about it not working right in that case. As for (1), I am almost positive it doesn't violate the spec. Will work to confirm on my end further. The only time it swaps order is if the label moves from passed in to a default label on the same metric. I don't even know if this is possible. :)

siimon / prom-client

perf: improve performance of rendering metrics to Prometheus string #549