jjneely / buckytools

Go implementation of useful tools for dealing with Graphite's Whisper DBs and Carbon hashing
Other
87 stars 21 forks source link

buckytools locate does not agree with existing cluster placement #11

Closed rsommer closed 7 years ago

rsommer commented 7 years ago

I'm trying to migrate away from carbonate to buckytools, but both do not agree about placement of metrics in our existing cluster. Is there any known incompatibility or pitfall when switching from carbonate to buckytools?

Example carbonate.conf

[test]
DESTINATIONS = test01:2003, test02:2003
REPLICATION_FACTOR = 1
SSH_USER = root

carbon-lookup yields:

statsd.disk.free1 => test02:2003:None
statsd.disk.free2 => test02:2003:None
statsd.disk.free3 => test01:2003:None

buckyd startet with:

buckyd --node test01 --prefix=/var/lib/graphite/whisper test01:2003 test02:2003
buckyd --node test02 --prefix=/var/lib/graphite/whisper test01:2003 test02:2003

bucky locate yields:

statsd.disk.free1 => test02
statsd.disk.free2 => test01
statsd.disk.free3 => test02
jjneely commented 7 years ago

The third value in the location is an "instance" value, which does nothing more than add additional data to the hashing algorithm to build a better hash ring. I wonder if carbonate is substituting in "None" for the instance value since you are not using them. That would build a different hashring.

I'd try using a specific value for the instance and see if they don't build matching hash rings.

rsommer commented 7 years ago

If i change the configuration and use a specific instance name the hash rings indeed seem to match. Adding "None" as instance name just for buckyd does not solve the problem. I'll have to try to find a way around this without reconfiguring the existing carbon hash ring.

deniszh commented 7 years ago

@rsommer : Could you please show how your DESTINATIONS looks in your carbon.conf?

rsommer commented 7 years ago

@deniszh : we are using carbon-relay-ng and go-carbon. The node names have no instance parts. Example:

init = [
    'addRoute consistentHashing production  test01:2003 spool=true pickle=false  test02:2003 spool=true pickle=false',
]

The go-tools and carbonate do aggree about metric placement.

deniszh commented 7 years ago

Yep, looks like a bug in buckytools then.

rsommer commented 7 years ago

I think the "problem" is, that carbon.util.parseDestination explicitly sets the instance to the python None-type, which hashes differently than the string "None" and so there is no way to produce the same hash ring in a non-python tool when the instance part is not set.

deniszh commented 7 years ago

I don't think that's a problem here. Carbon-c-relay written in C and have no problem with absent instance.

deniszh commented 7 years ago

See https://github.com/grobian/carbon-c-relay/blob/master/consistent-hash.c#L227-L232

jjneely commented 7 years ago

Definitely duplicated

$ ./buckyd --node test01 --prefix=/var/lib/graphite/whisper test01:2003 test02:2003

and

$ ./bucky locate statsd.disk.free1
2017/04/21 14:43:54 1 metrics assigned to test02
statsd.disk.free1 => test02
$ ./bucky locate statsd.disk.free2
2017/04/21 14:44:06 1 metrics assigned to test01
statsd.disk.free2 => test01
$ ./bucky locate statsd.disk.free3
2017/04/21 14:44:09 1 metrics assigned to test02
statsd.disk.free3 => test02

Where carbon-c-relay says:

statsd.disk.free1
match
    * -> statsd.disk.free1
    carbon_ch(test)
        test02:2003
statsd.disk.free2
match
    * -> statsd.disk.free2
    carbon_ch(test)
        test02:2003
statsd.disk.free3
match
    * -> statsd.disk.free3
    carbon_ch(test)
        test01:2003
jjneely commented 7 years ago

So, in the referenced commit I added a unit test for this case. It works. Would one of you be a second set of eyes on that code? Maybe I did something dumb.

Running bucky locate at the CLI returns the broken results and demonstrated above. Not sure how this is different yet.

jjneely commented 7 years ago

Oh drat....it is interpreting the port number as the instance.

jjneely commented 7 years ago

The hardest code to debug in your own.

This is a symantics issue, and I've updated the README.md and the help text blurb in buckyd to be more specific. The valid formats you can specify a hashring member to buckyd is:

You are doing SERVER:PORT instead and buckyd was interpreting the port as the instance value. So executing like this should fix the matter for you:

buckyd --node test01 --prefix=/var/lib/graphite/whisper test01 test02
rsommer commented 7 years ago

That means everyone runnning on non-standard ports using no instance names is running into this. I'll try to move to instance-based naming. Thanks for investigating.

jjneely commented 7 years ago

It sounds like you are still having problems?

The port information isn't even used here. I believe that was just an effort to keep configuration strings the same here as used in carbon-relay.py's DESTINATIONS variable. Rather than yet another mutation of how to represent server/instance/port.

We're interested in building a hash ring here which doesn't consider the port. Its also a bit ambiguous what port should be used at this point. (The bucky port, or one the carbon line protos?)

rsommer commented 7 years ago

I'm able to use bucky this way (leaving out the port info). It's just a little confusing that we now have 3 different notations to get matching hashrings:

bucky - server carbon-relay-ng - server:port carbonate - server:port:None (which will be replaced by bucky if everything works out now)

If you know it everything works fine. Next time we'll use instance identifiers even if there is only one instance per node.

jjneely commented 7 years ago

Which, of course, I was trying to avoid, and ended up stepping right into.

sw0x2A commented 6 years ago

I am running into a similar issue with Buckytools (version 0.4.0) either reporting the cluster inconsistent or bucky locate giving wrong results.

My carbon-c-relay config looks like this:

cluster cache_cluster
    fnv1a_ch
        172.22.6.46:2103
        172.22.6.46:2203
        172.22.6.47:2103
        172.22.6.47:2203
        172.22.6.48:2103
        172.22.6.48:2203
        172.22.6.49:2103
        172.22.6.49:2203
        172.22.6.50:2103
        172.22.6.50:2203
    ;

Buckyd is started like

/usr/bin/buckyd \
    -node 172.22.6.46 \
    -hash fnv1a \
    -prefix /data/graphite/whisper \
    172.22.6.46:2103=None 172.22.6.46:2203=None 172.22.6.47:2103=None 172.22.6.47:2203=None 172.22.6.48:2103=None 172.22.6.48:2203=None 172.22.6.49:2103=None 172.22.6.49:2203=None 172.22.6.50:2103=None 172.22.6.50:2203=None

I tried to run buckyd with every combination of 10x HOST[:PORT][=INSTANCE] where instance is either None or a and b. In any case, bucky servers reports that cluster is unhealthy and inconsistent. The only way to get a possible result (=cluster is healthy) of bucky servers is by defining just the 5 unique IP addresses but then the bucky locate result is wrong.