Open rohit3682 opened 6 years ago
+1 for pprof, can you please point to instructions on how you got that to work? Is this from /v1/debug/pprof? I debugged such issue (which was due to a bad join) using much more painful methods
Btw,
|window()
.period(duration)
.every(duration)
is saying you store the data for 1 hour. So you can expect some data to be built up. Btw, I used batch query shift/offset to avoid any deliberate memory build up when I was using such a look back in the past query. Also notice that this has a sideeffect that a restart will result in the system to be silent for an hour as it builds the state for that hour
For pprof, set pprof-enabled
in kapacitor.conf to true. Use https://docs.influxdata.com/kapacitor/v1.5/working/api/#miscellaneous to get the pprof file. Use go tool pprof
to look into the pprof file.
We cannot use batch as it increases the load on already overloaded InfluxDB server.
I wonder if you got a solution. Also, a stream query should be significantly more expensive than a batch one. Did you do profiling that shows batch is more expensive for you. Do you mind sharing the result? Notice that a streaming requires influx to stream all the points to kapacitor. The first tuning I did was to remove subscription for all the points in kapacitor. That caused a decent drop in influx cpu usage
As @faskiri pointed out, the 'window' node will store points for an hour and emit it at once. You seem to be doing a sum and that doesn't actually need storing all the points and you can just calculate it on the fly. You can try writing a custom UDF ('windowSum`?) that keeps summing a field for each 'group' and emits once a duration elapses (the udf example for moving average can be a good starting point). I am in the same boat where simple aggregations like last, count, avg on a window consume a lot of memory. I tried implementing a UDF this evening but noticed that kapacitor still kept on leaking memory (a bit slowly) as also reported here, https://github.com/influxdata/kapacitor/issues/2051
We have written a task that creates an alert if count is not more than zero for a period of one hour every one hour. Kapacitor service goes OutOfMemory at regular intervals. Following are the details
Machine EC2 (m4.xlarge)
Task (has been anonymized)
Heap dump
Kapacitor Version
Show task output