grobian / carbon-c-relay

Enhanced C implementation of Carbon relay, aggregator and rewriter
Apache License 2.0
380 stars 107 forks source link

carbon-c-relay topology #397

Closed mmueller101 closed 4 years ago

mmueller101 commented 4 years ago

Hi,

Thank you for writing a C version of the relay! Got a question about the best topology in terms of simplicity and scalability.

We would like to have 4 machines behind a load balancer without a dedicated relay between the load balancer and the 4 machines. Upon receiving a metric the idea is to randomly pick 2 of the 4 machines and send the metric to them. Those 2 machines would then store the metrics.

Is that possible to do with C-Relay without causing loops?

Thanks, Michael

grobian commented 4 years ago

You want to store metrics to a random host. This is probably not a good idea, unless those hosts share the same datastorage or something. Anyway, there is "random" (any_of), and there is "replication", which can only be used with a consistent-hash target. So there is a bit of both, but perhaps you can explain what the 4 machines are and why you want x2 traffic. Perhaps you can do with just sending each metric randomly to 2 clusters of 2 machines.

gerases commented 4 years ago

Ok, I might have explained an important detail poorly. The 2 machines wouldn't be chosen randomly but using the consistent hashing algorithm. So the sequence would be this:

  1. The loadbalancer choses 1 of the 4 hosts to send the metric to.
  2. The chosen machine uses consistent hashing to send the metric to two machines.

So somehow the machines need to be able to distinguish between an original message and a relayed (replicated) message. If the message is a relayed message, it needs to be saved. If the message is the original message, it needs to be relayed.

Does that make more sense?

P.S.: Michael and I are working on this design together, that's why I'm replying :)

grobian commented 4 years ago

ok, so I assume you pick something like haproxy's TCP loadbalancing as loadbalancer.

the four machines you mention are configured identically using fnv1a_ch replication 2, with the IPs for the cluster of machines they need to relay to.

If you want to use carbon-c-relay also as a loadbalancer (= probably a good idea), just use the any_of target for the cluster of 4 relays.

mmueller101 commented 4 years ago

Yes, all four machines are configured identically. So, the same machines being used to relay the metrics are configured with caches to store the metrics as well. In some instances a machine may receive a metric and then relay it back out to two (replication = 2) of the other three machines because it's not being stored on that machine. It may not have received the correct metric due to a loadbalancer placed in front of the machines.

Is this type of setup possible and if so how would that compare with the configuration you suggested?

Thanks.

grobian commented 4 years ago

It seems to me you should replace your loadbalancers with a carbon-c-relay instance which forwards metrics to the carbon-caches using replication = 2.

I guess in your case where you have loadbalancer forward to the 4 storage nodes, each of them need just carbon-c-relay with cluster config to point to the full cluster (being itself and the other nodes).

cluster mycluster fnv1a_ch replication 2
  1.1.1.1:2103
  1.1.1.2:2103
  1.1.1.3:2103
  1.1.1.4:2103;

match * send to mycluster;

run carbon-c-relay on the standard port 2003 and the caches on 2103, and you storage boxes should simply forward and deliver (either local and/or remote).

mmueller101 commented 4 years ago

Ok great. Thank you for that. Would this type of design be able to handle scaling well?

poblahblahblah commented 4 years ago

@mmueller101 we have 15 million active timeseries, updating every 30 seconds or so, and we can handle that load with just 2 carbon-c-relay instances running on c5.4xlarge EC2 instances hitting about 40% CPU utilization.

If we need more breathing room we just add/remove additional instances as load demands.

gerases commented 4 years ago

@poblahblahblah, cool -- what's your topology? Do you have the c-relays behind a load balancer and a number of caches below the relays?

poblahblahblah commented 4 years ago

yeah, more or less...

ELB ---> carbon-c-relay (ingest) -----------------> carbon-cache
                             |
                       carbon-c-relay (aggregator) --------> carbon-cache
gerases commented 4 years ago

@poblahblahblah, thank you. this is very, very valuable.

grobian commented 4 years ago

Is it ok to close this ticket? Or do you have additional questions?