jamiealquiza / polymur

A fast carbon-relay with live routing controls + https Graphite forwarder
MIT License
99 stars 13 forks source link

Add REPLICATION_FACTOR support #50

Open erez-rabih opened 7 years ago

erez-rabih commented 7 years ago

Hi,

I saw there are two modes: consistent-hashing and broadcast. How can I set the replication factor of a metric so that a single metric arrives consistently to two graphite backends?

jamiealquiza commented 7 years ago

Replication factor doesn't exist, but I could (and probably should) add it. Let me review this.

erez-rabih commented 7 years ago

Thanks for the fast reply. Also, I have thought about using polymur in production. I wanted to ask you how stable is it from your experience?

jamiealquiza commented 7 years ago

It's routed almost the entire production metrics traffic at FireEye for over a year, and I've also heard from some pretty well known companies that have began using it (although I didn't gather at what scale). From a stability standpoint, it's production-worthy and doesn't have any known/open stability related bugs. Mostly just features.

erez-rabih commented 7 years ago

Nice. I would definitely switch my carbon-relays to polymur once replication factor is implemented. Looks like a great project.

jamiealquiza commented 7 years ago

Renamed and will use this for issue tracking. Notes for development:

With replication, a get_nodes is called repeatedly during key lookup until a set of REPLICATION_FACTOR length (server, instance) tuples is gathered. These are the routing targets.

Initial idea would be to specify replication factor ~in the destination string, e.g. polymur -destinations="10.0.5.20:2003 for a REPLICATION_FACTOR equivalent of 2~ * as a -replication-factor config. Unspecified should default to 1 to be backwards compatible with existing configuration.

*Replication factor has to be applied to the whole pool, so per-destination settings don't make sense.

erez-rabih commented 7 years ago

I think replication factor should be an independent flag as it has no relation to a specific destination. Also, I see no use case for different replication factors on different destinations so there's not reason to attach a RF (replication factor) to a specific host:ip

jamiealquiza commented 7 years ago

Yeah, I just realized what I was doing and updated :)

erez-rabih commented 7 years ago

Also, RF should only be taken into account when consistent hashing is used since broadcast implicitly means RF = # Destinations

erez-rabih commented 7 years ago

Or if we really want to be smart about this - broadcast is just a specific case in which RF = #Destinations but I don't know the project well enough to decide if that's how you would like to implement this.

jamiealquiza commented 7 years ago

It should just be ignored in broadcast, since that's basically what broadcast is (send a copy of all metrics to all destinations in the list). Will probably just add a startup note that lets users know if broadcast is being used and a replication-factor is set, it's being ignored / has no effect.