grobian / carbon-c-relay

Enhanced C implementation of Carbon relay, aggregator and rewriter
Apache License 2.0
380 stars 107 forks source link

Questions about metric distribution using fnv1a_ch #424

Closed loitho closed 3 years ago

loitho commented 3 years ago

Hi there, So sorry to bother you, I had 2 questions

cluster graphite_swarm
  fnv1a_ch replication 2
    graph01.cplus:2003=B38ED90F4361BEF124865C7
    graph02.cplus:2003=B38ED90F4361BEF124865C7
    graph03.cplus:2003=ba603c36342304ed77953f
    graph04.cplus:2003=ba603c36342304ed77953f
    graph05.cplus:2003=48CEE9C5AD06D81A90426
    graph06.cplus:2003=48CEE9C5AD06D81A90426
    ;

In this configuration, when a metric is sent, it'll alway go to either : 01 and 02 or 03 and 04 or 05 and 06 Is that correct ? I'm doing this because the nodes are on 3 different availability zones in AWS, and I don't want metrics to fall in 2 servers of the same AZ.

I didn't find much documentation about the output of the "print hashring" command (I used another configuration, I'm : running /usr/bin/carbon-c-relay -dtf conf.conf < /dev/null > test yields :

# hash ring for graphite_swarm follows
  179@graph03.cplus:2003=b   179@graph03.cplus:2003=b   704@graph01.cplus:2003=B
  704@graph01.cplus:2003=B   930@graph05.cplus:2003=4   930@graph05.cplus:2003=4
  948@graph05.cplus:2003=4   948@graph05.cplus:2003=4  1024@graph01.cplus:2003=B
 1024@graph01.cplus:2003=B  1325@graph05.cplus:2003=4  1325@graph05.cplus:2003=4
 1574@graph05.cplus:2003=4  1574@graph05.cplus:2003=4  1727@graph05.cplus:2003=4

What does it mean ? Am I supposed to check this output against something else ?

Kind regards, loitho

deniszh commented 3 years ago

Hi @loitho

You can run carbon-relay with -t flag to run it in test mode. You can provide metric name to it - and it will print routing decision and on which node it will be located.

grobian commented 3 years ago

What you do in your config, is you introduce forced collisions by using an identical hash-key.

Will this do what you want, I'm not sure. If you have 3 different availability domains, then you expect a jump-hash over the three domains with replication is 2, and an individual per availability domain distribution of servers, either consistent or fault-tolerant depending on if the data in an availability domain is shared or not.

As @deniszh wrote, the -t test mode can help you to validate your config and whether or not it is doing what you want.

loitho commented 3 years ago

Hi to you both, Thank you for your replies ! Indeed, using the -t flag with the proper setup works much better and yields the following :

/ # echo "my.super.metric.value" | /usr/bin/carbon-c-relay -f conf.conf -t

cluster graphite_swarm
    fnv1a_ch replication 2
        graph01.cplus:2003=B38ED90F4361BEF124865C7
        graph02.cplus:2003=B38ED90F4361BEF124865C7
        graph03.cplus:2003=ba603c36342304ed77953f
        graph04.cplus:2003=ba603c36342304ed77953f
        graph05.cplus:2003=48CEE9C5AD06D81A90426
        graph06.cplus:2003=48CEE9C5AD06D81A90426
    ;

match
    * -> my.super.metric.value
    fnv1a_ch(graphite_swarm)
        graph05.cplus:2003
        graph03.cplus:2003
    stop

Which means that this doesn't do what I want and this behavior is different on the "carbonate" tool. (Carbonate sent the metrics to 2 servers with the same hashkey everytime) But considering that we're not supposed to use the same hashkey everywhere I guess it's normal that the behavior isn't the same. Would you mind if I were to do a PR to add an example in the documentation on how to use the -t flag ?

My goal is just to have a single duplicate of every metric on a node that's not in the same AZ. So say : graph01 => Zone A graph02 => Zone B graph03 => Zone C graph04 => Zone A graph05 => Zone B graph06 => Zone C I don't want my metric to arrive on the node 01 and the node 04, because if the AZ falls, then I don't have any metrics left

If I want to do that, there isn't any way possible by having them in this configuration, is that correct ? I should either have a replication factor of 3, or make two smaller cluster like this and send to both of them ?

cluster graphite_swarm1
    fnv1a_ch replication 1
        graph01.cplus:2003=B38ED90F4361BEF124865C7
        graph03.cplus:2003=ba603c36342304ed77953f
        graph05.cplus:2003=48CEE9C5AD06D81A90426
cluster graphite_swarm
    fnv1a_ch replication 1
        graph02.cplus:2003=B38ED90F4361BEF124865C7
        graph04.cplus:2003=ba603c36342304ed77953f
        graph06.cplus:2003=48CEE9C5AD06D81A90426
grobian commented 3 years ago

Would you mind if I were to do a PR to add an example in the documentation on how to use the -t flag ?

not at all, please do!

My goal is just to have a single duplicate of every metric on a node that's not in the same AZ. So say : graph01 => Zone A graph02 => Zone B graph03 => Zone C graph04 => Zone A graph05 => Zone B graph06 => Zone C I don't want my metric to arrive on the node 01 and the node 04, because if the AZ falls, then I don't have any metrics left

If I want to do that, there isn't any way possible by having them in this configuration, is that correct ? I should either have a replication factor of 3, or make two smaller cluster like this and send to both of them ?

cluster graphite_swarm1
    fnv1a_ch replication 1
        graph01.cplus:2003=B38ED90F4361BEF124865C7
        graph03.cplus:2003=ba603c36342304ed77953f
        graph05.cplus:2003=48CEE9C5AD06D81A90426
cluster graphite_swarm
    fnv1a_ch replication 1
        graph02.cplus:2003=B38ED90F4361BEF124865C7
        graph04.cplus:2003=ba603c36342304ed77953f
        graph06.cplus:2003=48CEE9C5AD06D81A90426

This will ensure that graph 01 and 02 (03 and 04, 05 and 06) receive the same metrics. That's a smart way of using the same hashkeys within the fnv1a_ch cluster.