grafana / carbon-relay-ng

Fast carbon relay+aggregator with admin interfaces for making changes online - production ready
Other
467 stars 151 forks source link

CPU Issue with v0.13.0 version #438

Open krishnaindani opened 4 years ago

krishnaindani commented 4 years ago

With the upgrade from v0.12.0 to v0.13.0 version, we are seeing a constant increase in CPU utilization for carbon-relay-ng.

With the v0.12.0 version, we used to have spiked in CPU when the network traffic and workload increases but v0.13.0 version CPU utilization is not coming down and constantly increasing.

image

From the snapshot above, between 08/18 to 08/19 20:00, it was at v0.12.0 version and after that even with the same network traffic CPU utilization is spiking.

We are using carbon relay as deployment in Kubernetes (v1.17). Cabon relay config has some aggregation rules and metrics are sent to Grafana labs Graphite with the 10 seconds metrics aggregation rules.

This is another instance for example, where we have defined aggregation rules and in the route section, we are sending metrics to confluent cloud Kafka.

image

In this snapshot when CPU utilization came down, it was a restart and spiking again.

Dieterbe commented 4 years ago

can you confirm that if you revert to 0.12 the issue disappears?

krishnaindani commented 4 years ago

Yes, the issue disappears with rollback to the v0.12.0 version.

krishnaindani commented 4 years ago

This is the snap after rolling back

image

We can see that on 08/20 12:00, it was rolled back to v0.12.0 version and CPU looks good after that even though we saw some increase in network activity.

cjonesshipt commented 4 years ago

I updated to v0.13.0 on Friday in our Staging-Env, just to double check that I'm still seeing this, too. Looks, like it's still a problem for me.

Green line is the upgrade to v0.13.0, fwiw:

image
Dieterbe commented 4 years ago

is it possible to let it run without aggregator? reviewing the changelog, seems most changes for the 0.13.0 release was related to aggregators. I suspect either that, or something to do with matchers (anything that has a prefix/substring/regex condition on it, such as a route), but more likely aggregator.