High availability carbon-c-relay, how-to?

costimuraru commented 6 years ago

We currently have a single carbon-c-relay which takes care of routing the metrics to the backend and performing the various aggregations. The problem with this is the single point of failure, where if the carbon-c-relay node goes down (either a real problem, or we want to upgrade and need a restart) we lose metrics. Having a cluster of two or more carbon-c-relays would solve the routing problem, but would break the aggregation part. We'd have multiple boxes each receiving a subset of the metrics, thus generating a subset of the aggregation. We'd therefore need a way to aggregate the sub-aggregations downstream somehow.

I was wondering what would be a recommendation to fix this and have HA carbon-c-relay. Any help/input is gladly appreciated. Thanks!

hadret commented 6 years ago

hi @costimuraru,

I can't really speak about best practices but I so far evaluated two scenarios and picked one for our needs (YMMV). We have ~1500000 metrics received/minute -- so not an awful lot -- and we are doing simple "passthrough" of them (i.e. no aggregation or other fancy stuff) on both tcp and udp.

The setup: two carbon-relay hosts with a floating virtual IP (managed via keepalived), configured in exactly the same way. They are sharding the metrics across two underlying whisper hosts using fnv1a_ch. Everything in our infra is configured in a way to deliver metrics either via tcp or upd to the floating IP. Whenever we need to have maintenance time on the carbon-relay hosts, we can easily switch the IP and do our stuff, switch it back and carry on with the second host etc. etc.

Different approach I can imagine being viable option is to have HAProxy in front of carbon-relay hosts. Here as well I would use floating virtual IP, but this time it would be HAProxy handling the traffic and then passing through metrics to given endpoints. The problem I had with HAProxy was lack of udp balancing on which we are still relying on.

Let me know whether you have any additional questions!

All the best Filip

grobian commented 6 years ago

Sorry, I didn't notice this issue sofar.

The aggregators are indeed hard. One thing you could do, is to duplicate the data, and send it to two aggregators with the same config. In theory, both aggregators should produce exactly the same output, while the produced metrics may come at a slightly different time (to avoid thundering herd problem). Now from there you could decide to either drop one of the aggregators output (by firewall rules or floating ip, or something) or to have both aggregators deliver to the carbon store. Since the metrics should be the same, it means the same value is delivered twice, it just doubles traffic to the carbon store.

reyjrar commented 6 years ago

Here's the solution I came up with.

carbon-c-relay is low overhead, so I install it on every host in my network. It uses the any_of operator to send all metrics received on localhost:2003 to either collector in each datacenter. I only use two, but you could have more.
Collectors listening on 0.0.0.0:2003 send data to every storage node via the forward directive, though you could define 2 clusters each with fvn1a_ch and send all metrics to both clusters. These nodes in my setup all listen on 3002.
The storage pushdown listener on 3002 uses the carbon_ch to dispatch metrics to the carbon-cache.py instances listening on localhost. They also perform all the aggregations and dispatch them to the local carbon-caches.

This setup isn't perfect and I'm going to be evolving it a bit in the future. The only downside is when a node is down, the aggregations are messed up, but I can usually resync those with carbonate pretty quickly. I recently performed OS/kernel/firmware updates on the cluster one node at a time, and despite being unreachable for more than 50 minutes, I didn't lose any metrics.

carbon-c-relay, FTW. 👍

grobian commented 6 years ago

thanks!

grobian / carbon-c-relay

High availability carbon-c-relay, how-to? #322