Closed mmueller101 closed 4 years ago
You want to store metrics to a random host. This is probably not a good idea, unless those hosts share the same datastorage or something. Anyway, there is "random" (any_of), and there is "replication", which can only be used with a consistent-hash target. So there is a bit of both, but perhaps you can explain what the 4 machines are and why you want x2 traffic. Perhaps you can do with just sending each metric randomly to 2 clusters of 2 machines.
Ok, I might have explained an important detail poorly. The 2 machines wouldn't be chosen randomly but using the consistent hashing algorithm. So the sequence would be this:
So somehow the machines need to be able to distinguish between an original message and a relayed (replicated) message. If the message is a relayed message, it needs to be saved. If the message is the original message, it needs to be relayed.
Does that make more sense?
P.S.: Michael and I are working on this design together, that's why I'm replying :)
ok, so I assume you pick something like haproxy's TCP loadbalancing as loadbalancer.
the four machines you mention are configured identically using fnv1a_ch replication 2, with the IPs for the cluster of machines they need to relay to.
If you want to use carbon-c-relay also as a loadbalancer (= probably a good idea), just use the any_of target for the cluster of 4 relays.
Yes, all four machines are configured identically. So, the same machines being used to relay the metrics are configured with caches to store the metrics as well. In some instances a machine may receive a metric and then relay it back out to two (replication = 2) of the other three machines because it's not being stored on that machine. It may not have received the correct metric due to a loadbalancer placed in front of the machines.
Is this type of setup possible and if so how would that compare with the configuration you suggested?
Thanks.
It seems to me you should replace your loadbalancers with a carbon-c-relay instance which forwards metrics to the carbon-caches using replication = 2.
I guess in your case where you have loadbalancer forward to the 4 storage nodes, each of them need just carbon-c-relay with cluster config to point to the full cluster (being itself and the other nodes).
cluster mycluster fnv1a_ch replication 2
1.1.1.1:2103
1.1.1.2:2103
1.1.1.3:2103
1.1.1.4:2103;
match * send to mycluster;
run carbon-c-relay on the standard port 2003 and the caches on 2103, and you storage boxes should simply forward and deliver (either local and/or remote).
Ok great. Thank you for that. Would this type of design be able to handle scaling well?
@mmueller101 we have 15 million active timeseries, updating every 30 seconds or so, and we can handle that load with just 2 carbon-c-relay instances running on c5.4xlarge EC2 instances hitting about 40% CPU utilization.
If we need more breathing room we just add/remove additional instances as load demands.
@poblahblahblah, cool -- what's your topology? Do you have the c-relays behind a load balancer and a number of caches below the relays?
yeah, more or less...
ELB ---> carbon-c-relay (ingest) -----------------> carbon-cache
|
carbon-c-relay (aggregator) --------> carbon-cache
@poblahblahblah, thank you. this is very, very valuable.
Is it ok to close this ticket? Or do you have additional questions?
Hi,
Thank you for writing a C version of the relay! Got a question about the best topology in terms of simplicity and scalability.
We would like to have 4 machines behind a load balancer without a dedicated relay between the load balancer and the 4 machines. Upon receiving a metric the idea is to randomly pick 2 of the 4 machines and send the metric to them. Those 2 machines would then store the metrics.
Is that possible to do with C-Relay without causing loops?
Thanks, Michael