question about queue behavior

grobian / carbon-c-relay

Enhanced C implementation of Carbon relay, aggregator and rewriter

Apache License 2.0

380 stars 107 forks source link

question about queue behavior #422

Closed jbe-dw closed 3 years ago

jbe-dw commented 3 years ago

Hello,

I have to relays receiving metrics in round robin. They are configured with a 9 million object queue each and the batch option is set with 2500 metrics. They both send data to two clusters with a replication factor of 1.

This weekend a graphite carbon node from one cluster went down until this morning. Can you tell me what is carbon-c-relay behavior when the queue were full of this node's metrics ? As it batches metrics, does it mean that regular metrics have been lost or is it a seperate queue ?

Thanks

grobian commented 3 years ago

the batch option has no influence on how the queue is managed when it overflows
the queue is a FIFO, that is, when it is full, new entries push out the oldest in the queue

It is very likely that you've lost a number of metrics, but this wholly depends on how many metrics were sent over the weekend to the fallen node. If this exceeds the 9 million you mention your queue size is, then you've lost the surplus of that number.

jbe-dw commented 3 years ago

Ok so I've lost metrics for the fallen node right ? It's not an issue then because I have a copy on the other cluster.

grobian commented 3 years ago

correct, each node has an associated queue, so delivery to available nodes isn't affected by nodes that are down

jbe-dw commented 3 years ago

Thank you for your quick answers. I use this case to thank you for your great work on this project.