mikejihbe / metrics

A metrics library for Node.js
574 stars 58 forks source link

Optimize percentile calculation #67

Closed shuhei closed 5 years ago

shuhei commented 5 years ago

Happy Holidays!

The deep-clone of the binary heap seems to be the bottleneck of Histogram#percentiles().

I think we can just use the heap items without cloning or sorting by priority because they will be sorted by value in Histogram anyway.

Benchmark

The updated version is roughly 5x-6x faster on my Macbook Air, Early 2015, 2,2 GHz Intel Core i7.

$ node -v
v8.14.1
$ node percentiles.js
original x 509 ops/sec ±0.76% (89 runs sampled)
updated x 2,883 ops/sec ±0.89% (92 runs sampled)
done
$ node -v
v10.14.2
$ node percentiles.js
original x 565 ops/sec ±1.94% (87 runs sampled)
updated x 3,548 ops/sec ±5.29% (82 runs sampled)
done
$ node -v
v11.5.0
$ node percentiles.js
original x 522 ops/sec ±1.78% (87 runs sampled)
updated x 2,470 ops/sec ±1.38% (90 runs sampled)
done

with the following benchmark code:

// percentiles.js
const assert = require("assert");
const metrics = require("metrics");
const BinaryHeap = require("metrics/lib/binary_heap");
const Benchmark = require("benchmark");

const suite = new Benchmark.Suite();

const histogram = new metrics.Histogram();
for (let i = 0; i < 10000; i++) {
  histogram.update(20 + Math.random() * Math.random() * 10000);
}

let a;
let b;
let sample;
suite
  .add("original", () => {
    a = histogram.percentiles();
  })
  .add("updated", () => {
    b = histogram.percentiles();
  })
  .on("cycle", function(event) {
    console.log(String(event.target));
    BinaryHeap.prototype.getValues = function() {
      return this.content;
    };
    metrics.ExponentiallyDecayingSample.prototype.getValues = function () {
      return this.values.getValues().map(function(v) { return v.val; });
    };
  })
  .on("complete", function() {
    assert.deepEqual(a, b);
    console.log("done");
  })
  .run();
shuhei commented 5 years ago

Oh, we can also remove another step (extracting values from the heap in the order of priority). The values are re-sorted by value anyway.

shuhei commented 5 years ago

I pushed a commit and updated the PR description accordingly.

tolbertam commented 5 years ago

Hi @shuhei, thanks for contributing! At first glance this looks great, I'll give it a closer look tonight.

tolbertam commented 5 years ago

This looks great, and the benchmark was super helpful for showing the nice improvement from this.

I think there is still a case where there is some value doing a clone/copy, and that's when used with Histogram.prototype.values. Unfortunately, the documentation doesn't indicate whether getValues() returns a snapshot or the live evolving data, but that it was doing a copy indicates to me that this was the intent.

If someone were to use that to essentially get a 'snapshot' of the sample for future reference, i.e. show me the values at this point in time, if we don't do a clone future updates to the ExponentiallyDecayingSample will update these values in place which could be problematic. I noticed that UniformSample suffers from this same issue, since getValues just returns the underlying values and not a copy.

I think one work around for this would be to add an optional boolean parameter to both Sample.getValues() and Histogram.values.

We can name the parameter snapshot, and if the parameter isn't provided, we can assume that the value is true (do a deep copy), this is for backwards compatibility. In the case of Histogram.percentiles invocation of getValues() we can simply call getValues(false) which will not return a copy, as we end up creating a new list with values converted to floats.

What do you think of that solution? I think this way we could maintain the previous behavior if someone was using Histogram.values and expecting a snapshot, and we would still observe this nice performance benefit when using Histogram.percentiles.

tolbertam commented 5 years ago

Thanks again for contributing this improvement! Released as v0.1.21

shuhei commented 5 years ago

@tolbertam Thanks a lot for your thorough review and publishing a new version!