grafana / metrictank

metrics2.0 based, multi-tenant timeseries store for Graphite and friends.
GNU Affero General Public License v3.0
622 stars 105 forks source link

new version blows up when opening for requests #180

Closed Dieterbe closed 8 years ago

Dieterbe commented 8 years ago

it's the first time seeing this in prod, when deploying f040a9e5741ca16c84ebc763314f7b4391c98900 with agg-settings = 10min:6h:2:38d:true,2h:6h:2:120d:true, not sure it's because of these changes or because the load pattern has changed (increased) due to recent extra signups:

i've seen this twice:

MT dash: https://grafana-monitor.raintank.io/dashboard/snapshot/IWAPH7sho06f0n0TgBmdWsuQykwZfVmM

sys dash: https://grafana-monitor.raintank.io/dashboard/snapshot/Fv7Rb3G0yR5enyZF7oS2kF2iyMMaBPIW

will try to repro in QA hypothesis:maybe extensive GC workload maxing out cpu and blocking progress, but mem usage shouldn't be that high i think. gctrace will help.

Dieterbe commented 8 years ago

no luck yet. in raintank-docker i'm trying to fill up a bunch of data ( up to 3 days back) and then suddenly launch a similar request workload as seen in prod:

+./fake_metrics_to_nsq -keys-per-org 100 -orgs 10 -statsd-addr statsdaemon:8125 -nsqd-tcp-address nsqd:4150 -offset 3d -speedup 400 -stop-at-now &> /var/log/raintank/fake.log
+./fake_metrics_to_nsq -keys-per-org 100 -orgs 10 -statsd-addr statsdaemon:8125 -nsqd-tcp-address nsqd:4150 &> /var/log/raintank/fake2.log &

followed by:

$ cat repro-prod-issue.sh
#!/bin/sh
./inspect-es -es-index metric -format vegeta-mt -from 3d -es-addr elasticsearch:9200 > mc.txt
./inspect-es -es-index metric -format vegeta-mt -from 9min -es-addr elasticsearch:9200 > m.txt
ls -alh mc.txt m.txt

cat mc.txt | sed 's#18763#6063#' | ./vegeta attack -duration 100s -rate 500 > mc.out &
cat m.txt | sed 's#18763#6063#'  | ./vegeta attack -duration 100s -rate 5000 > m.out &
echo "launched!"
sleep 110

but MT handles this just fine and memory only goes up by a gig or so. maybe cause i'm working with a smaller amount of metrics here (1k vs 500k in prod). it would take too long to on-demand backload a similar volume on my laptop and since there's no caching going on (at least not in MT) requesting the same data multiple times should work just as well. notably in prod:

maybe i can do a qa run with similar-to-prod metric volume.

now i'm trying something else: i suspect that maybe the index for the new graphite endpoint introduces too many new pointers and keeps the GC busier so using basically a steady instream like so:

+./fake_metrics_to_nsq -keys-per-org 100 -orgs 100 -statsd-addr statsdaemon:8125 -nsqd-tcp-address nsqd:4150 -offset 3h -speedup 10 -stop-at-now &> /var/log/raintank/fake.log
+./fake_metrics_to_nsq -keys-per-org 100 -orgs 100 -statsd-addr statsdaemon:8125 -nsqd-tcp-address nsqd:4150 &> /var/log/raintank/fake2.log &

and just running metric-tank with +GODEBUG=gctrace=1 ./metric_tank with a patched GC like so:

diff --git a/src/runtime/mgc.go b/src/runtime/mgc.go
index 94301c6..b0d3528 100644
--- a/src/runtime/mgc.go
+++ b/src/runtime/mgc.go
@@ -1297,6 +1297,7 @@ func gcMarkTermination() {
                if work.mode != gcBackgroundMode {
                        print(" (forced)")
                }
+               print(" sw ", gcController.scanWork, " bgscancred ", gcController.bgScanCredit)
                print("\n")
                printunlock()
        }

latest master: sys dash: https://snapshot.raintank.io/dashboard/snapshot/D1A8qzn673G2v01syp32bxlZ516cH8qb MT dash: https://snapshot.raintank.io/dashboard/snapshot/j5hPeum00o8Itpa8234TYC8cV2bfKgkz

dd69883 (pre graphite endpoint): MT dash: https://snapshot.raintank.io/dashboard/snapshot/o8J1vnIUnjSNV11wmo2hw2uc10XHh7zA sys dash:https://snapshot.raintank.io/dashboard/snapshot/ta3trxmlqTPL1I0l1vaY4Z7LE82dOQP3

unfortunately all cpu/mem looks identical. i also compared the GC data using gcvis (patched for the 2 new params) and everything looks identical, also the timings :(

Dieterbe commented 8 years ago

a85f3af (previous stable in prod, aka pkg-0.1.0-1457484036ubuntu1) MT: https://snapshot.raintank.io/dashboard/snapshot/KU1EmofIUNT6EusRvPrbNkvwvIIcwF5v sys: https://snapshot.raintank.io/dashboard/snapshot/1jDols6042Vx2CLPotc5QQqt2NRvpa1T looks the same again.for the record, attached the gc view which also looks the same gc

Dieterbe commented 8 years ago

been doing more testing using a new mt4 instance that's not in the pool and that has the "broken" MT installed. then gor to replicate prod traffic to mt4

dieter@metric-tank-2-prod:~$ sudo ./gor --input-raw :18763 --output-http metric-tank-4-prod:18763

also ran loops like

while true ;do let i=i+1; echo $i; curl 'http://localhost:18763/debug/pprof/profile?seconds=1' > mt4-cpu-$i; done
while true ;do let i=i+1; echo $i; curl http://localhost:18763/debug/pprof/heap > mt4-heap-$i; done

to get profiles right before crash. unfortunately those don't seem to show anything interesting.

~ ❯❯❯ go tool pprof --inuse_space mt4-bin mt4-heap-2119
Entering interactive mode (type "help" for commands)
(pprof) top50 -cum
2800.22MB of 2823.27MB total (99.18%)
Dropped 195 nodes (cum <= 14.12MB)
      flat  flat%   sum%        cum   cum%
         0     0%     0%  2823.27MB   100%  runtime.goexit
         0     0%     0%  1844.66MB 65.34%  github.com/nsqio/go-nsq.(*Consumer).handlerLoop
         0     0%     0%  1844.66MB 65.34%  main.(*Handler).HandleMessage
      27MB  0.96%  0.96%  1093.60MB 38.74%  main.(*AggMetrics).GetOrCreate
  760.09MB 26.92% 27.88%  1066.60MB 37.78%  main.NewAggMetric
         0     0% 27.88%   968.06MB 34.29%  main.main
         0     0% 27.88%   968.06MB 34.29%  runtime.main
  104.01MB  3.68% 31.56%   900.58MB 31.90%  main.NewAggregator
         0     0% 31.56%   709.56MB 25.13%  main.(*AggMetric).Add
         0     0% 31.56%   586.57MB 20.78%  main.(*DefCache).Backfill
         0     0% 31.56%   586.57MB 20.78%  main.NewDefCache
         0     0% 31.56%   530.07MB 18.78%  github.com/raintank/raintank-metric/metricdef.(*DefsEs).GetMetrics
   85.51MB  3.03% 34.59%   530.07MB 18.78%  github.com/raintank/raintank-metric/schema.MetricDefinitionFromJSON
  177.51MB  6.29% 40.88%   516.03MB 18.28%  main.NewChunk
         0     0% 40.88%   476.52MB 16.88%  main.(*AggMetric).addAggregators
         0     0% 40.88%   476.52MB 16.88%  main.(*Aggregator).Add
         0     0% 40.88%   476.52MB 16.88%  main.(*Aggregator).flush
         0     0% 40.88%   444.56MB 15.75%  encoding/json.Unmarshal
         0     0% 40.88%   439.56MB 15.57%  encoding/json.(*decodeState).object
         0     0% 40.88%   439.56MB 15.57%  encoding/json.(*decodeState).unmarshal
         0     0% 40.88%   439.56MB 15.57%  encoding/json.(*decodeState).value
  381.48MB 13.51% 54.39%   381.48MB 13.51%  main.NewCassandraStore
  338.52MB 11.99% 66.38%   338.52MB 11.99%  github.com/dgryski/go-tsz.New
  203.51MB  7.21% 73.59%   203.51MB  7.21%  fmt.Sprintf
      26MB  0.92% 74.51%   191.53MB  6.78%  github.com/dgryski/go-tsz.(*Series).Push
         0     0% 74.51%   191.53MB  6.78%  main.(*Chunk).Push
         0     0% 74.51%   181.55MB  6.43%  reflect.Value.SetMapIndex
  181.55MB  6.43% 80.94%   181.55MB  6.43%  reflect.mapassign
  167.53MB  5.93% 86.87%   167.53MB  5.93%  github.com/dgryski/go-tsz.(*bstream).writeBits
         0     0% 86.87%   158.51MB  5.61%  encoding/json.(*decodeState).literal
  158.51MB  5.61% 92.49%   158.51MB  5.61%  encoding/json.(*decodeState).literalStore
         0     0% 92.49%       86MB  3.05%  encoding/json.(*decodeState).array
   56.50MB  2.00% 94.49%    56.50MB  2.00%  main.(*DefCache).Backfill.func1
         0     0% 94.49%    48.50MB  1.72%  reflect.MakeSlice
   48.50MB  1.72% 96.21%    48.50MB  1.72%  reflect.unsafe_NewArray
         0     0% 96.21%    35.50MB  1.26%  reflect.MakeMap
   35.50MB  1.26% 97.47%    35.50MB  1.26%  reflect.makemap
         0     0% 97.47%       33MB  1.17%  github.com/raintank/raintank-metric/msg.(*MetricData).DecodeMetricData
    1.50MB 0.053% 97.52%       33MB  1.17%  github.com/raintank/raintank-metric/schema.(*MetricDataArray).UnmarshalMsg
    1.50MB 0.053% 97.57%    31.50MB  1.12%  github.com/raintank/raintank-metric/schema.(*MetricData).UnmarshalMsg
      30MB  1.06% 98.63%       30MB  1.06%  github.com/tinylib/msgp/msgp.ReadStringBytes
         0     0% 98.63%    15.50MB  0.55%  reflect.Value.Convert
   15.50MB  0.55% 99.18%    15.50MB  0.55%  reflect.cvtBytesString
(pprof) 
~ ❯❯❯ go tool pprof mt4-bin mt4-cpu-60
Entering interactive mode (type "help" for commands)
(pprof) top50 -cum
3.52s of 3.54s total (99.44%)
Dropped 6 nodes (cum <= 0.02s)
      flat  flat%   sum%        cum   cum%
         0     0%     0%      3.54s   100%  runtime.goexit
         0     0%     0%      2.85s 80.51%  main.Get
         0     0%     0%      2.85s 80.51%  main.get.func1
     0.26s  7.34%  7.34%      2.85s 80.51%  main.graphiteRaintankJSON
         0     0%  7.34%      2.85s 80.51%  net/http.(*ServeMux).ServeHTTP
         0     0%  7.34%      2.85s 80.51%  net/http.(*conn).serve
         0     0%  7.34%      2.85s 80.51%  net/http.HandlerFunc.ServeHTTP
         0     0%  7.34%      2.85s 80.51%  net/http.serverHandler.ServeHTTP
     0.06s  1.69%  9.04%      1.47s 41.53%  strconv.AppendFloat
     0.11s  3.11% 12.15%      1.41s 39.83%  strconv.genericFtoa
     0.19s  5.37% 17.51%      1.30s 36.72%  strconv.bigFtoa
         0     0% 17.51%      1.29s 36.44%  runtime.growslice
     1.06s 29.94% 47.46%      1.06s 29.94%  runtime.memmove
         0     0% 47.46%      0.76s 21.47%  runtime.growslice_n
         0     0% 47.46%      0.69s 19.49%  runtime.gcBgMarkWorker
         0     0% 47.46%      0.69s 19.49%  runtime.gcDrain
     0.41s 11.58% 59.04%      0.68s 19.21%  runtime.scanobject
     0.08s  2.26% 61.30%      0.68s 19.21%  strconv.formatDigits
     0.19s  5.37% 66.67%      0.60s 16.95%  strconv.fmtF
     0.32s  9.04% 75.71%      0.32s  9.04%  runtime.duffzero
     0.29s  8.19% 83.90%      0.29s  8.19%  runtime.memclr
     0.07s  1.98% 85.88%      0.21s  5.93%  strconv.AppendUint
     0.13s  3.67% 89.55%      0.18s  5.08%  runtime.greyobject
     0.09s  2.54% 92.09%      0.14s  3.95%  strconv.formatBits
     0.09s  2.54% 94.63%      0.09s  2.54%  runtime.heapBitsForObject
     0.05s  1.41% 96.05%      0.07s  1.98%  strconv.(*decimal).Assign
         0     0% 96.05%      0.05s  1.41%  runtime.heapBits.setMarked
     0.05s  1.41% 97.46%      0.05s  1.41%  runtime/internal/atomic.Or8
     0.05s  1.41% 98.87%      0.05s  1.41%  strconv.(*decimal).Round
         0     0% 98.87%      0.03s  0.85%  runtime.mallocgc
         0     0% 98.87%      0.03s  0.85%  runtime.rawmem
         0     0% 98.87%      0.03s  0.85%  runtime.systemstack
         0     0% 98.87%      0.02s  0.56%  runtime.heapBits.initSpan
         0     0% 98.87%      0.02s  0.56%  runtime.largeAlloc
         0     0% 98.87%      0.02s  0.56%  runtime.mallocgc.func3
     0.02s  0.56% 99.44%      0.02s  0.56%  strconv.trim

another experiment, bisecting locally compiled MT instances:

sudo service metric_tank stop; sudo cp metric_tank /usr/sbin/metric_tank ; sudo service metric_tank start; ps aux | grep metric_tank
Dieterbe commented 8 years ago

ok so we now know 6e6cb96 is the culprit, or somehow triggers a bug elsewhere.

8d18f34313097fbda66bfd99a321267790a34df5 disable colors in metric_tank's logs -> instant crash https://snapshot.raintank.io/dashboard/snapshot/i8CzC9yqctFBkHHWpkLN7NvnsbTTeCGf

same but 6e6cb96 reverted: works fine again : https://snapshot.raintank.io/dashboard/snapshot/qeM4pfKB8ret4vwVuA7F7b032u2vve4u

I also tried some patches, all visible here https://gist.github.com/Dieterbe/e9414e91564f185ea7c7 specifically:

the focus on fix was due to #171 and also because that's what got changed in 6e6cb96

Dieterbe commented 8 years ago

got another mem profile, this time with

+       runtime.MemProfileRate = 1

this shows:

(pprof) top50 -cum
25753346.64kB of 26036249.83kB total (98.91%)
Dropped 633 nodes (cum <= 130181.25kB)
      flat  flat%   sum%        cum   cum%
         0     0%     0% 26035269.52kB   100%  runtime.goexit
         0     0%     0% 20136622.62kB 77.34%  main.getTargets.func1
         0     0%     0% 20136621.12kB 77.34%  main.getTarget
20134415.44kB 77.33% 77.33% 20134415.44kB 77.33%  main.fix
    3.72kB 1.4e-05% 77.33% 3419776.31kB 13.13%  net/http.(*conn).serve
         0     0% 77.33% 3416172.34kB 13.12%  net/http.(*ServeMux).ServeHTTP
         0     0% 77.33% 3416172.34kB 13.12%  net/http.HandlerFunc.ServeHTTP
         0     0% 77.33% 3416172.34kB 13.12%  net/http.serverHandler.ServeHTTP
   48.81kB 0.00019% 77.33% 3416147.31kB 13.12%  main.Get
         0     0% 77.33% 3416147.31kB 13.12%  main.get.func1
   88.34kB 0.00034% 77.33% 3411948.09kB 13.10%  main.graphiteRaintankJSON
         0     0% 77.33% 3411709.25kB 13.10%  strconv.AppendFloat
         0     0% 77.33% 3411709.25kB 13.10%  strconv.bigFtoa
3411709.25kB 13.10% 90.44% 3411709.25kB 13.10%  strconv.fmtF
         0     0% 90.44% 3411709.25kB 13.10%  strconv.formatDigits
         0     0% 90.44% 3411709.25kB 13.10%  strconv.genericFtoa
         0     0% 90.44% 1467987.61kB  5.64%  github.com/nsqio/go-nsq.(*Consumer).handlerLoop
         0     0% 90.44% 1467982.89kB  5.64%  main.(*Handler).HandleMessage
27464.53kB  0.11% 90.54% 1144988.02kB  4.40%  main.(*AggMetrics).GetOrCreate
800744.55kB  3.08% 93.62% 1117522.61kB  4.29%  main.NewAggMetric
    0.33kB 1.3e-06% 93.62% 1004349.55kB  3.86%  main.main
         0     0% 93.62% 1004349.55kB  3.86%  runtime.main
105592.69kB  0.41% 94.02% 950334.19kB  3.65%  main.NewAggregator
    0.16kB 6e-07% 94.02% 613466.55kB  2.36%  main.NewDefCache
         0     0% 94.02% 613466.39kB  2.36%  main.(*DefCache).Backfill
         0     0% 94.02% 556115.67kB  2.14%  github.com/raintank/raintank-metric/metricdef.(*DefsEs).GetMetrics
97334.16kB  0.37% 94.40% 556110.27kB  2.14%  github.com/raintank/raintank-metric/schema.MetricDefinitionFromJSON
         0     0% 94.40% 458785.92kB  1.76%  encoding/json.Unmarshal
         0     0% 94.40% 454360.55kB  1.75%  encoding/json.(*decodeState).object
         0     0% 94.40% 454360.55kB  1.75%  encoding/json.(*decodeState).unmarshal
         0     0% 94.40% 454360.55kB  1.75%  encoding/json.(*decodeState).value
390641.62kB  1.50% 95.90% 390749.15kB  1.50%  main.NewCassandraStore
    0.22kB 8.4e-07% 95.90% 293175.67kB  1.13%  main.(*AggMetric).Add
211988.97kB  0.81% 96.71% 212170.38kB  0.81%  fmt.Sprintf
70345.72kB  0.27% 96.98% 211037.25kB  0.81%  main.NewChunk
         0     0% 96.98% 194668.31kB  0.75%  reflect.Value.SetMapIndex
194668.31kB  0.75% 97.73% 194668.31kB  0.75%  reflect.mapassign
         0     0% 97.73% 167608.80kB  0.64%  encoding/json.(*decodeState).literal
167608.39kB  0.64% 98.37% 167608.80kB  0.64%  encoding/json.(*decodeState).literalStore
         0     0% 98.37% 147612.88kB  0.57%  main.(*AggMetric).addAggregators
         0     0% 98.37% 147612.88kB  0.57%  main.(*Aggregator).Add
         0     0% 98.37% 147612.88kB  0.57%  main.(*Aggregator).flush
140691.44kB  0.54% 98.91% 140691.53kB  0.54%  github.com/dgryski/go-tsz.New
(pprof) list main.fix
Total: 24.83GB
ROUTINE ======================== main.fix in /home/dieter/go/src/github.com/raintank/raintank-metric/metric_tank/dataprocessor.go
   19.20GB    19.20GB (flat, cum) 77.33% of Total
         .          .     51:       start = from + interval - remain
         .          .     52:   }
         .          .     53:
         .          .     54:   // last point should be the last value that divides by interval lower than to (because to is always exclusive)
         .          .     55:   lastPoint := (to - 1) - ((to - 1) % interval)
   19.20GB    19.20GB     56:   out := make([]schema.Point, (lastPoint-start)/interval+1)
         .          .     57:
         .          .     58:   // t is the ts we're looking to fill.
         .          .     59:   // i iterates in
         .          .     60:   // o iterates out
         .          .     61:   for t, i, o := start, 0, -1; t <= lastPoint; t += interval {
(pprof) 
Dieterbe commented 8 years ago

really weird that that line is the culprit since it's very similar to the original line.

i see two possibilities: 1) stack escape analysis treats this slice allocation different as before

2) somewhere further down the read path the slice leaks and is not reclaimed. but the leak would have to be caused somehow by the perf improvement.. which seems a bit unlikely. also i don't see where it could leak. divide() drops references to it. consolidate() passes it on, and ultimately after graphiteJSON/graphiteRaintankJSON we should have no more references to this slice.

Dieterbe commented 8 years ago
$ b -gcflags='-m' 2>&1 | grep dataprocessor.go:56
./dataprocessor.go:56: make([]schema.Point, (lastPoint - start) / interval + 1) escapes to heap
./dataprocessor.go:56: make([]schema.Point, (lastPoint - start) / interval + 1) escapes to heap

with patch reverted:

$ b -gcflags='-m' 2>&1 | grep dataprocessor.go:39
./dataprocessor.go:39: make([]schema.Point, 0, len(in)) escapes to heap
./dataprocessor.go:39: make([]schema.Point, 0, len(in)) escapes to heap
Dieterbe commented 8 years ago

aha! I think i found the bug. fix() is getting incoming requests such as from 1458966244 to 1458966274 (e.g. a 30s span) but interval is 60s and len becomes 71582788 see http://play.golang.org/p/VSE-srJYOv

Dieterbe commented 8 years ago

the input slices seem to be empty in these cases btw:

+fmt.Println(">fix", "f", from, "t", to, "int", interval, " len", (lastPoint-start)/interval+1)

>fix [{0 1458967095} {0 1458967105} {0 1458967115}] f 1458967095 t 1458967125 int 10  len 3
>fix [] f 1458967095 t 1458967125 int 10  len 3
>fix [] f 1458967095 t 1458967125 int 60  len 71582788
>fix [{0 1458967095} {0 1458967105} {0 1458967115}] f 1458967095 t 1458967125 int 10  len 3
Dieterbe commented 8 years ago

similar issue was #164