graphite-project / carbon

Carbon is one of the components of Graphite, and is responsible for receiving metrics over the network and writing them down to disk using a storage backend.
http://graphite.readthedocs.org/
Apache License 2.0
1.5k stars 490 forks source link

Optimizing carbon to run without a cache queue #856

Closed ecsumed closed 5 years ago

ecsumed commented 5 years ago

Hi,

We're currently running a single carbon instance on an AWS C5.XLarge (4cores, 8GB) with the following config:

[cache]
LINE_RECEIVER_INTERFACE = 0.0.0.0
PICKLE_RECEIVER_INTERFACE = 0.0.0.0
CACHE_QUERY_INTERFACE = 0.0.0.0

LINE_RECEIVER_PORT = 2003
PICKLE_RECEIVER_PORT = 2004
CACHE_QUERY_PORT = 7002

MAX_CACHE_SIZE = inf
MAX_UPDATES_PER_SECOND = 5000
MAX_UPDATES_PER_SECOND_ON_SHUTDOWN = 5000
MAX_CREATES_PER_MINUTE = 500

USE_FLOW_CONTROL = True

LOG_UPDATES = False
LOG_CACHE_HITS = False
LOG_CACHE_QUEUE_SORTS = False
LOG_CREATES = True

ENABLE_LOGROTATION = True

This runs fine. However, our use case is such that we do not want a queue as this results in a new graphs lag time of about 40-45 minutes. And with increasing clients and in turn increasing metrics, this lag will increase.

In the last few days, I've tried a combination of relay's and multiple carbons per core setups all on a single disk. None have worked.

I also thought the problem could be with the disk IO since it's always busy. To test this, I converted the AWS 3000 IOPs volume to an AWS 10000 IOPs volume and increased the MAX_UPDATES_PER_SECOND to 50000. But this had no effect as it did not even dent the queue.

Am I hitting some kind of soft limit with this config? If not, what steps can I take to decrease this queue?

We have a combination of metrics of varying intervals and retention schemes.

Here's our past 25 days of relevant data: https://imgur.com/Csc9dvx (The dip on the 23rd of May, is when the service was stopped and the volume was converted to an AWS 10k IOPs)

piotr1212 commented 5 years ago

Strange, your points per update is just 3, which would imply that your queues are very small, unless you have some really long interval.

I also thought the problem could be with the disk IO since it's always busy.

What does iostat say?

ecsumed commented 5 years ago

@piotr1212 Hi. Thanks for the quick reply. Our shortest interval is 5 minutes. Max retention goes up to a year. Here's a small iostat -x 1. Longer dump is here -> https://paste.debian.net/1082875/

The data disk is nvme2n1

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
nvme0n1           0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
nvme1n1           0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
nvme2n1           0.00     0.00 2155.00    0.00 12684.00     0.00    11.77     1.77    0.81    0.81    0.00   0.45  96.40

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          38.21    0.00   15.38   20.00    0.00   26.41

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
nvme0n1           0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
nvme1n1           0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
nvme2n1           0.00     0.00 2052.00    0.00 11744.00     0.00    11.45     1.98    0.98    0.98    0.00   0.47  95.60

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          47.92    0.00   27.86    8.33    0.00   15.89

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
nvme0n1           0.00     0.00    3.00    0.00    80.00     0.00    53.33     0.00    0.00    0.00    0.00   0.00   0.00
nvme1n1           0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
nvme2n1           0.00     0.00 1935.00 1435.00 11256.00  5740.00    10.09    87.83   26.06    0.91   59.98   0.27  91.60

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          46.44    0.00   24.80   10.29    0.00   18.47

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
nvme0n1           0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
nvme1n1           0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
nvme2n1           0.00     0.00 2212.00    0.00 12792.00     0.00    11.57     1.46    0.66    0.66    0.00   0.42  93.20
piotr1212 commented 5 years ago

Your disk seems saturated (%util) but mostly due to reads rather than writes. Reads are needed for aggregation. Can you post your storage-schemas.conf, you might be able to reduce the number of archive so you need less reads. Which version of carbon and whisper are you using?

ecsumed commented 5 years ago

@piotr1212 My carbon and whisper are both Version: 0.9.13. Also, here's my storage schema. I have at most 2 archives per metric.

[carbon]
pattern = ^carbon\.
retentions = 60s:90d

[data.def.retention]
pattern = ^data\.(.*)\.def\.
retentions = 900s:1d,3600s:365d

[data.ghi.retention]
pattern = ^data\.(.*)\.ghi\.
retentions = 3600s:1d,86400s:365d

[data.jkl.retention]
pattern = ^data\.(.*)\.jkl\.
retentions = 1800s:1d,86400s:365d

[data.mno.retention]
pattern = ^data\.(.*)\.disk\.mno\.
retentions = 1800s:1d,86400s:365d

[data.disk.usage.retention]
pattern = ^data\.(.*)\.disk\.root\.
retentions = 1800s:1d,86400s:365d

[data.disk.pqr.usage.retention]
pattern = ^data\.(.*)\.disk\.pqr\.
retentions = 86400s:365d

[data.disk.stu.usage.retention]
pattern = ^data\.(.*)\.disk\.stu\.
retentions = 86400s:365d

[data.vwx.retention]
pattern = ^data\.(.*)\.vwx\.
retentions = 900s:1d,3600s:365d

[data.net.retention]
pattern = ^data\.(.*)\.net\.eth0\.
retentions = 600s:1d,3600s:365d

[data.yz.count.retention]
pattern = ^data\.(.*)\.yz\.count\.
retentions = 1800s:1d,86400s:365d

[data.var.retention]
pattern = ^data\.(.*)\.var\.
retentions = 900s:1d,3600s:365d

[data.cba.retention]
pattern = ^data\.(.*)\.cba\.
retentions = 120s:1d

[data.fed.retention]
pattern = ^data\.(.*)\.fed\.
retentions = 300s:1d,900s:365d

[data.memory.retention]
pattern = ^data\.(.*)\.memory\.
retentions = 300s:1d,900s:365d

[data.ihg.retention]
pattern = ^data\.(.*)\.ihg\.
retentions = 600s:1d,3600s:365d

[data.lkj.retention]
pattern = ^data\.(.*)\.lkj\.
retentions = 1d:365d

[abc2]
pattern = ^abc2\.
retentions = 60s:1d,900s:365d

[abc]
pattern = ^abc\.
retentions = 3600s:1825d

[default_1min_for_1day]
pattern = .*
retentions = 60s:1d,900s:365d
piotr1212 commented 5 years ago

0.9.13 is old, there are some perf improvements in later versions wrt page cache trashing. I don't know your budget for disk space, but I'd just get rid of the second archive and make the first larger, especially for the metrics which have less than 1 datapoint per 15 minutes. More ram could also help, then reads will come from page cache instead of disk, but I have no clue how much more ram you'd need.