graphite-project / carbon

Carbon is one of the components of Graphite, and is responsible for receiving metrics over the network and writing them down to disk using a storage backend.
http://graphite.readthedocs.org/
Apache License 2.0
1.5k stars 490 forks source link

carbon-aggregate 100% CPU #876

Closed rudybroersma closed 4 years ago

rudybroersma commented 4 years ago

Hi,

We have 2 boxes with optical network taps and 'fastnetmon' running on each node. fastnetmon sends data to graphite (graphite runs on 1 box), and we use carbon-aggregate to create totals. Our aggregation-rules.conf looks like this:

all.hosts.<ip>.incoming.average.pps (60) = sum fastnetmon*.hosts.<ip>.incoming.average.pps
all.hosts.<ip>.outgoing.average.pps (60) = sum fastnetmon*.hosts.<ip>.outgoing.average.pps
all.hosts.<ip>.incoming.average.bps (60) = sum fastnetmon*.hosts.<ip>.incoming.average.bps
all.hosts.<ip>.outgoing.average.bps (60) = sum fastnetmon*.hosts.<ip>.outgoing.average.bps

all.total.incoming.bps (60) = sum fastnetmon*.total.incoming.bps
all.total.outgoing.bps (60) = sum fastnetmon*.total.outgoing.bps
all.total.incoming.pps (60) = sum fastnetmon*.total.incoming.pps
all.total.outgoing.pps (60) = sum fastnetmon*.total.outgoing.pps
all.total.incoming.flows (60) = sum fastnetmon*.total.incoming.flows
all.total.outgoing.flows (60) = sum fastnetmon*.total.outgoing.flows

To give an idea of traffic, we do about 4 to 5 gbit/s traffic in+out. 50k IPs.

Our carbon-aggregate service uses consistently 100% CPU. We also see lines like:

29/01/2020 11:45:54 :: CarbonClientProtocol(127.0.0.1:2004:None) send queue has space available
29/01/2020 11:45:56 :: CarbonClientFactory(127.0.0.1:2004:None) send queue is full (20000 datapoints)

What can I do to lower the load on carbon-aggregate? Can i loadbalance this process on multiple hosts?

deniszh commented 4 years ago

Hi @rudybroersma ,

You need to switch to RELAY_METHOD = aggregated-consistent-hashing - then carbon will distribute metric across carbon caches using aggregation rules. See https://github.com/graphite-project/carbon/issues/865 or https://github.com/graphite-project/carbon/pull/32 for details. But please note that it probably has some issues, like https://github.com/graphite-project/carbon/issues/325 Another option - try to use aggregators on https://github.com/grobian/carbon-c-relay or https://github.com/grafana/carbon-relay-ng. They are also single threaded, but maybe faster (because written in C / Golang)

stale[bot] commented 4 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.