influxdata / kapacitor

Open source framework for processing, monitoring, and alerting on time series data
MIT License
2.31k stars 492 forks source link

Kapacitor sizing documentation? #1303

Open toni-moreno opened 7 years ago

toni-moreno commented 7 years ago

Hi . I'm looking for basic sizing information, and I can't see nothing on the official documentation site (https://docs.influxdata.com/kapacitor/v1.2/introduction/). I would like to know sizing on some tipical use cases.

My use case will be similar to this:

1.- Subscribing on a InfluxDB with 100K time series, and applying 2 or 3 stream rules for each serie ( 1 point / minute) 2.- ingesting graphite 100k graphite metrics, and also applying 2 or 3 stream rules for each serie ( 1 point / minute)

Greatest monitoring time window should be in both cases no more than 1 hour.

What cpu/memory /I/O networking and disk should be necessary for a good kapacitor performance? Should I install 2 separate processes one for each data source? or can I use only one kapacitor process?

Thank you very much

nathanielc commented 7 years ago

@toni-moreno Thanks for the detailed question, unfortunately I won't be able to give very specific answer as it depends mostly on how complex the tasks you want to run are.

In general Kapacitor's resource needs are as follows:

mspiegle commented 7 years ago

@nathanielc I have a well-used instance of kapacitor (EC2 c4.8xlarge, 30k points/s). I notice a constant 2-3% iowait across each of the 36 cores. I believe it is related to the boltDB file. Upon closer inspection of the boltDB file, it looks like the alert() node stores a number of keys. If I had to guess, it looks like for each .crit() of my alert() node, kapacitor may initiate a write to boltDB. My questions are:

1) Does my assessment thus far seem to be reasonable? (alert() node writes to kapacitor.db which causes 2-3% iowait) 2) Can you speak to the write path of the kapacitor.db? Is it fully asynchronous, or will kapacitor need to wait for the data to be written? 3) I tried changing my instance type to r4.16xlarge (64 cores), and I noticed the iowait% went up significantly... I believe it was double-digits. Do you have thoughts on why this is?

I ask in this github issue because for environments that make heavy use of the alert() node, it may make sense to put kapacitor.db on faster disk.