grafana / metrictank

metrics2.0 based, multi-tenant timeseries store for Graphite and friends.
GNU Affero General Public License v3.0
623 stars 104 forks source link

ROB: Fix underflow bug and reduce allocations #2006

Closed shanson7 closed 2 years ago

shanson7 commented 3 years ago

Underflow Bug

This was not my main goal, but was breaking the benchmarks. This bug is unlikely in production because it requires very old timestamps.

The line rob.buf[rob.newest].Ts-(uint32(cap(rob.buf))*rob.interval can cause an underflow when rob.buf[rob.newest].Ts is smaller than cap(rob.buf)*rob.interval and will wrap around to a very large number. This incorrectly returns and error (metric too old). The benchmarks all start at Ts=1 so they were never actually getting beyond this check. I added a test case for this as well.

Reduce Allocations

One of the top allocators in our Metrictank deploy is the ROB. It makes an allocation if any point is flushed during Add. This is due to how it returns the flushed values from Add in a slice. However, in the most common case data comes in order and will flush exactly 1 point. This change is to return a schema.Point along with the slice. If a single point needs to be flushed, no slice needs to be allocated. If more than one is flushed, then we still need to allocate the slice. However, I also pre-allocate the slice to the size needed, which speeds up even the out-of-order cases.

This change does make the code a little uglier, but it is quite a bit faster and reduces allocations.

name                                   old time/op    new time/op    delta
ROB10AddDisallowUpdate-12                49.8ns ± 2%    20.7ns ± 7%   -58.43%  (p=0.000 n=18+20)
ROB120AddDisallowUpdate-12               50.5ns ± 3%    21.0ns ± 7%   -58.39%  (p=0.000 n=20+20)
ROB600AddDisallowUpdate-12               50.0ns ± 2%    19.7ns ± 2%   -60.55%  (p=0.000 n=20+17)
ROB10AddShuffled5DisallowUpdate-12       46.5ns ± 1%    26.4ns ± 6%   -43.18%  (p=0.000 n=17+18)
ROB120AddShuffled5DisallowUpdate-12      46.7ns ± 1%    26.1ns ± 1%   -44.08%  (p=0.000 n=19+20)
ROB600AddShuffled5DisallowUpdate-12      46.6ns ± 1%    26.1ns ± 2%   -44.10%  (p=0.000 n=19+20)
ROB10AddShuffled50DisallowUpdate-12      12.6ns ± 2%    10.5ns ± 6%   -16.87%  (p=0.000 n=17+20)
ROB120AddShuffled50DisallowUpdate-12     31.5ns ± 3%    26.3ns ± 2%   -16.41%  (p=0.000 n=19+18)
ROB600AddShuffled50DisallowUpdate-12     31.9ns ± 3%    26.1ns ± 3%   -18.18%  (p=0.000 n=19+20)
ROB10AddShuffled500DisallowUpdate-12     6.43ns ± 9%    6.34ns ± 6%      ~     (p=0.262 n=20+20)
ROB120AddShuffled500DisallowUpdate-12    11.5ns ± 6%    10.4ns ± 3%    -9.87%  (p=0.000 n=20+19)
ROB600AddShuffled500DisallowUpdate-12    28.3ns ± 6%    25.5ns ± 1%   -10.15%  (p=0.000 n=20+17)
ROB10AddAllowUpdate-12                   50.2ns ± 2%    19.6ns ± 1%   -60.91%  (p=0.000 n=19+19)
ROB120AddAllowUpdate-12                  50.4ns ± 1%    19.8ns ± 3%   -60.73%  (p=0.000 n=20+19)
ROB600AddAllowUpdate-12                  50.4ns ± 2%    19.8ns ± 2%   -60.67%  (p=0.000 n=20+18)
ROB10AddShuffled5AllowUpdate-12          47.1ns ± 3%    26.4ns ± 3%   -44.00%  (p=0.000 n=19+18)
ROB120AddShuffled5AllowUpdate-12         46.7ns ± 3%    26.2ns ± 1%   -44.00%  (p=0.000 n=20+19)
ROB600AddShuffled5AllowUpdate-12         47.1ns ± 3%    26.9ns ± 6%   -43.02%  (p=0.000 n=18+20)
ROB10AddShuffled50AllowUpdate-12         13.1ns ± 5%    10.9ns ±10%   -17.14%  (p=0.000 n=20+20)
ROB120AddShuffled50AllowUpdate-12        33.6ns ±10%    26.5ns ± 3%   -21.16%  (p=0.000 n=17+19)
ROB600AddShuffled50AllowUpdate-12        32.4ns ± 6%    26.4ns ± 3%   -18.67%  (p=0.000 n=20+20)
ROB10AddShuffled500AllowUpdate-12        6.25ns ± 3%    6.18ns ± 5%      ~     (p=0.072 n=17+20)
ROB120AddShuffled500AllowUpdate-12       11.2ns ± 4%    10.5ns ± 2%    -6.97%  (p=0.000 n=20+18)
ROB600AddShuffled500AllowUpdate-12       28.0ns ± 6%    25.7ns ± 3%    -8.41%  (p=0.000 n=19+20)

name                                   old alloc/op   new alloc/op   delta
ROB10AddDisallowUpdate-12                 15.0B ± 0%      0.0B       -100.00%  (p=0.000 n=20+20)
ROB120AddDisallowUpdate-12                15.0B ± 0%      0.0B       -100.00%  (p=0.000 n=20+20)
ROB600AddDisallowUpdate-12                15.0B ± 0%      0.0B       -100.00%  (p=0.000 n=20+20)
ROB10AddShuffled5DisallowUpdate-12        47.0B ± 0%     12.0B ± 0%   -74.47%  (p=0.000 n=20+20)
ROB120AddShuffled5DisallowUpdate-12       47.0B ± 0%     12.0B ± 0%   -74.47%  (p=0.000 n=20+20)
ROB600AddShuffled5DisallowUpdate-12       47.0B ± 0%     12.0B ± 0%   -74.47%  (p=0.000 n=20+20)
ROB10AddShuffled50DisallowUpdate-12       9.00B ± 0%     2.00B ± 0%   -77.78%  (p=0.000 n=20+20)
ROB120AddShuffled50DisallowUpdate-12      40.0B ± 0%     17.0B ± 0%   -57.50%  (p=0.000 n=20+20)
ROB600AddShuffled50DisallowUpdate-12      40.0B ± 0%     17.0B ± 0%   -57.50%  (p=0.000 n=20+20)
ROB10AddShuffled500DisallowUpdate-12      0.00B          0.00B           ~     (all equal)
ROB120AddShuffled500DisallowUpdate-12     8.00B ± 0%     4.00B ± 0%   -50.00%  (p=0.000 n=20+20)
ROB600AddShuffled500DisallowUpdate-12     32.0B ± 0%     16.0B ± 0%   -50.00%  (p=0.000 n=20+20)
ROB10AddAllowUpdate-12                    15.0B ± 0%      0.0B       -100.00%  (p=0.000 n=20+20)
ROB120AddAllowUpdate-12                   15.0B ± 0%      0.0B       -100.00%  (p=0.000 n=20+20)
ROB600AddAllowUpdate-12                   15.0B ± 0%      0.0B       -100.00%  (p=0.000 n=20+20)
ROB10AddShuffled5AllowUpdate-12           47.0B ± 0%     12.0B ± 0%   -74.47%  (p=0.000 n=20+20)
ROB120AddShuffled5AllowUpdate-12          47.0B ± 0%     12.0B ± 0%   -74.47%  (p=0.000 n=20+20)
ROB600AddShuffled5AllowUpdate-12          47.0B ± 0%     12.0B ± 0%   -74.47%  (p=0.000 n=20+20)
ROB10AddShuffled50AllowUpdate-12          9.00B ± 0%     2.00B ± 0%   -77.78%  (p=0.000 n=20+20)
ROB120AddShuffled50AllowUpdate-12         40.0B ± 0%     17.0B ± 0%   -57.50%  (p=0.000 n=20+20)
ROB600AddShuffled50AllowUpdate-12         40.0B ± 0%     17.0B ± 0%   -57.50%  (p=0.000 n=20+20)
ROB10AddShuffled500AllowUpdate-12         0.00B          0.00B           ~     (all equal)
ROB120AddShuffled500AllowUpdate-12        8.00B ± 0%     4.00B ± 0%   -50.00%  (p=0.000 n=20+20)
ROB600AddShuffled500AllowUpdate-12        32.0B ± 0%     16.0B ± 0%   -50.00%  (p=0.000 n=20+20)
shanson7 commented 2 years ago

We could also (in addition to your solution, or as an alternative to your solution) allocate the slice of points at ROB creation time. and reuse that same slice every time we need it to return the slice out of Add().

I suppose the issue with that is we're holding on to a bunch of memory we only use occasionally. That could in turn be solved by each ROB sharing a pool of slices, but at that point, I prefer your approach.

Yeah, that was my first step but the extra memory was too much (basically doubling the ROB size). It's also a slightly awkward interface from a multi-threaded context, though that isn't a big deal for this code.