perf: significantly improve the memory usage of histogram

shappir commented 9 months ago

Significantly reduce the amount of memory used by histograms, especially for high cardinality/lots of buckets:

Create bucketValues object with prototype of zero values instead of copying. This way counter per bucket is only allocated when its value is greater than zero (first time it's incremented).
Allocate empty bucketExemplars (when using exemplars) instead of pre-filling with nulls.

Additional optimizations:

Insert valueFromMap into hash only when it's allocated (don't reinsert every time)
Don't lookup again for bucketExemplars. Instead reuse previous lookup

Notes:

Removed Object.freeze(this.bucketValues) because it causes += 1 to fail, even though the value in the prototype isn't actually changed. (This feels like a JS or v8 bug)
Because initial counter values are in a prototype, can't use hasOwnProperty to check for bucket existence

shappir commented 9 months ago

This is the reason Object.freeze needs to be removed: https://github.com/tc39/how-we-work/blob/main/terminology.md#override-mistake

shappir commented 9 months ago

Do you have any umbers or graph to confirm this helps things?

No systematic results. I will try get some.

shappir commented 9 months ago

My tests show a memory saving of only 5% 😢 Guess I shouldn't have called it significant ... It's borderline worth it - your call. (I will add the test example into the repo if you want.)

zbjornson commented 9 months ago

How many buckets total, and how many with values, did you test and get 5% savings?

shappir commented 9 months ago

@zbjornson @SimenB I tested one histogram with the default buckets: [0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1, 2.5, 5, 10] Tests:

10 labels with random distribution, inserting 1000 random values
10 labels with random distribution, inserting 10000 random values
5 labels with random distribution, inserting 100000 random values

I got roughly the same results in all cases, peaking at a saving of 5%.

Generally speaking (in percentages):

The less cardinality you have the less benefit you'll get
There's a fixed overhead for the histogram itself, so the less values you have, the less benefit you'll get

siimon / prom-client

perf: significantly improve the memory usage of histogram #610