grafana / metrictank

metrics2.0 based, multi-tenant timeseries store for Graphite and friends.
GNU Affero General Public License v3.0
622 stars 105 forks source link

re-think aggregation stratergy #141

Closed woodsaj closed 8 years ago

woodsaj commented 8 years ago

Though aggregation is not enabled in Production, in dev the current aggregation policies are

600:21600:2,7200:21600:2,21600:21600:2

In the aggSettings above, we have 10minute aggregations, 2hour aggregations and 6hour aggregations all being written to Cassandra every 6hours. Instead of storing the 2hour and 6hour data we could very easily compute it every time we write the 10minute aggregations to Cassandra.

This idea hasn't been completely thought through, but i think it is worth further thought and could potentially help reduce memory and cpu needed for aggregations considerably.

Dieterbe commented 8 years ago

so if you have a raw series and 3 aggregations, could each aggregation always look at the data from the previous aggregation series ? basically compute its data by looking at the band with higher resolution instead of the raw series? is that what you're saying? i've also been thinking about that. i don't think it would save memory cause memory usage is constant per aggregation and does not depend on how much input there was. however cpu usage would decrease, yes.

Dieterbe commented 8 years ago

@woodsaj as i explained above it looks to me like this would provide no memory benefit, but probably some cpu benefit. to get more insights I got a cpu profile from mt4-prod. to give you an idea of workload: 20.7 kmetrics/s ingest - 120 req/s mem-cass () mem 750 req/s, see https://snapshot.raintank.io/dashboard/snapshot/ayONHP7x36BWyZz6WUqT3TS5awX8hCia for details. it's running with agg-settings 10min:6h:2:38d:false,2h:6h:2:120d:false aka 2 rollup archives

(btw collecting profiles is pretty easy, i've included the steps below as well)

ssh -L 18764:localhost:18763 metric-tank-4.prod.raintank.io
~ ❯❯❯ curl http://localhost:18764/debug/pprof/profile > mt4-prof                             ⏎
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 49352    0 49352    0     0   1467      0 --:--:--  0:00:33 --:--:-- 11185
~ ❯❯❯ scp metric-tank-4.prod.raintank.io:/usr/sbin/metric_tank mt4-bin
metric_tank                                                  100%   14MB 422.1KB/s   00:33    
~ ❯❯❯ 
~ ❯❯❯ go tool pprof mt4-bin mt4-prof

(pprof) top50 -cum
3.43s of 8.77s total (39.11%)
Dropped 231 nodes (cum <= 0.04s)
Showing top 50 nodes out of 204 (cum >= 0.30s)
      flat  flat%   sum%        cum   cum%
         0     0%     0%      7.04s 80.27%  runtime.goexit
         0     0%     0%      3.90s 44.47%  github.com/nsqio/go-nsq.(*Consumer).handlerLoop
     0.04s  0.46%  0.46%      3.29s 37.51%  main.(*Handler).HandleMessage
     0.44s  5.02%  5.47%      1.81s 20.64%  main.(*AggMetric).Add
     0.06s  0.68%  6.16%      1.01s 11.52%  runtime.systemstack
     0.03s  0.34%  6.50%      0.98s 11.17%  main.initMetrics.func1
         0     0%  6.50%      0.89s 10.15%  github.com/raintank/raintank-metric/msg.(*MetricData).DecodeMetricData
         0     0%  6.50%      0.89s 10.15%  github.com/raintank/raintank-metric/schema.(*MetricDataArray).UnmarshalMsg
         0     0%  6.50%      0.87s  9.92%  main.getTargets.func1
         0     0%  6.50%      0.84s  9.58%  main.getTarget
     0.37s  4.22% 10.72%      0.81s  9.24%  runtime.mallocgc
     0.01s  0.11% 10.83%      0.79s  9.01%  main.getSeries
     0.01s  0.11% 10.95%      0.79s  9.01%  runtime.mcall
     0.19s  2.17% 13.11%      0.77s  8.78%  github.com/raintank/raintank-metric/schema.(*MetricData).UnmarshalMsg
         0     0% 13.11%      0.77s  8.78%  runtime.park_m
     0.02s  0.23% 13.34%      0.77s  8.78%  runtime.schedule
     0.04s  0.46% 13.80%      0.63s  7.18%  runtime.findrunnable
     0.05s  0.57% 14.37%      0.62s  7.07%  main.(*AggMetric).addAggregators
         0     0% 14.37%      0.61s  6.96%  main.(*MetricPersistHandler).HandleMessage
     0.11s  1.25% 15.62%      0.57s  6.50%  main.(*Aggregator).Add
     0.01s  0.11% 15.74%      0.56s  6.39%  main.(*Chunk).Push
     0.46s  5.25% 20.98%      0.54s  6.16%  syscall.Syscall
         0     0% 20.98%      0.49s  5.59%  runtime.selectgo
     0.14s  1.60% 22.58%      0.49s  5.59%  runtime.selectgoImpl
     0.09s  1.03% 23.60%      0.47s  5.36%  github.com/dgryski/go-tsz.(*Series).Push
         0     0% 23.60%      0.44s  5.02%  runtime.ReadMemStats
         0     0% 23.60%      0.44s  5.02%  runtime.ReadMemStats.func1
         0     0% 23.60%      0.44s  5.02%  runtime.readmemstats_m
     0.44s  5.02% 28.62%      0.44s  5.02%  runtime.updatememstats
         0     0% 28.62%      0.42s  4.79%  github.com/nsqio/go-nsq.(*Conn).readLoop
         0     0% 28.62%      0.39s  4.45%  main.(*Aggregator).flush
     0.30s  3.42% 32.04%      0.39s  4.45%  runtime.mapaccess2_faststr
         0     0% 32.04%      0.36s  4.10%  encoding/json.Unmarshal
     0.36s  4.10% 36.15%      0.36s  4.10%  runtime.futex
         0     0% 36.15%      0.35s  3.99%  runtime._System
         0     0% 36.15%      0.34s  3.88%  runtime.morestack
     0.01s  0.11% 36.26%      0.34s  3.88%  runtime.newstack
     0.06s  0.68% 36.94%      0.33s  3.76%  github.com/tinylib/msgp/msgp.ReadStringBytes
         0     0% 36.94%      0.33s  3.76%  net.(*conn).Write
         0     0% 36.94%      0.33s  3.76%  net.(*netFD).Write
     0.02s  0.23% 37.17%      0.33s  3.76%  runtime.makeslice
     0.01s  0.11% 37.29%      0.33s  3.76%  runtime.newarray
     0.01s  0.11% 37.40%      0.32s  3.65%  net/http.(*conn).serve
         0     0% 37.40%      0.32s  3.65%  runtime.copystack
     0.02s  0.23% 37.63%      0.32s  3.65%  runtime.newobject
         0     0% 37.63%      0.31s  3.53%  sync.(*Mutex).Lock
         0     0% 37.63%      0.31s  3.53%  syscall.Write
         0     0% 37.63%      0.31s  3.53%  syscall.write
     0.09s  1.03% 38.65%      0.30s  3.42%  main.(*AggMetric).Get
     0.04s  0.46% 39.11%      0.30s  3.42%  runtime.gentraceback
(pprof) 

what's important in the profile above is the main.(*AggMetric).addAggregators line, which is basically saying 7% of cpu time is currently spent adding the data into the aggregates.

So yes, there is some cpu to be gained but it would require reworking how aggregation works. currently we just repurpose the Aggmetric concept which is not a perfect match for the reason you mentioned and also some others (e.g. we have to pregenerate the key.. comes with some Sprintf overhead, querying for aggregated data needs to go through the parent, there is some overlap in data between the structs, etc) I don't think there's a pressing need to redesign/implement how aggregations work right now, the 5~10% savings doesn't seem worth it at this point.

but when the time comes we can take changes like the one you're suggesting into account. note also that right now memory is the bottleneck by far, not cpu (though this may change, especially once we start having less data in ram, which i think we should do)

what do you think of closing this and revisiting this later?

woodsaj commented 8 years ago

Yep, happy to close this. It can be revisted for metric-tank2.0