Closed rsommer closed 7 years ago
The third value in the location is an "instance" value, which does nothing more than add additional data to the hashing algorithm to build a better hash ring. I wonder if carbonate is substituting in "None" for the instance value since you are not using them. That would build a different hashring.
I'd try using a specific value for the instance and see if they don't build matching hash rings.
If i change the configuration and use a specific instance name the hash rings indeed seem to match. Adding "None" as instance name just for buckyd does not solve the problem. I'll have to try to find a way around this without reconfiguring the existing carbon hash ring.
@rsommer : Could you please show how your DESTINATIONS looks in your carbon.conf?
@deniszh : we are using carbon-relay-ng and go-carbon. The node names have no instance parts. Example:
init = [
'addRoute consistentHashing production test01:2003 spool=true pickle=false test02:2003 spool=true pickle=false',
]
The go-tools and carbonate do aggree about metric placement.
Yep, looks like a bug in buckytools then.
I think the "problem" is, that carbon.util.parseDestination explicitly sets the instance to the python None-type, which hashes differently than the string "None" and so there is no way to produce the same hash ring in a non-python tool when the instance part is not set.
I don't think that's a problem here. Carbon-c-relay written in C and have no problem with absent instance.
Definitely duplicated
$ ./buckyd --node test01 --prefix=/var/lib/graphite/whisper test01:2003 test02:2003
and
$ ./bucky locate statsd.disk.free1
2017/04/21 14:43:54 1 metrics assigned to test02
statsd.disk.free1 => test02
$ ./bucky locate statsd.disk.free2
2017/04/21 14:44:06 1 metrics assigned to test01
statsd.disk.free2 => test01
$ ./bucky locate statsd.disk.free3
2017/04/21 14:44:09 1 metrics assigned to test02
statsd.disk.free3 => test02
Where carbon-c-relay says:
statsd.disk.free1
match
* -> statsd.disk.free1
carbon_ch(test)
test02:2003
statsd.disk.free2
match
* -> statsd.disk.free2
carbon_ch(test)
test02:2003
statsd.disk.free3
match
* -> statsd.disk.free3
carbon_ch(test)
test01:2003
So, in the referenced commit I added a unit test for this case. It works. Would one of you be a second set of eyes on that code? Maybe I did something dumb.
Running bucky locate at the CLI returns the broken results and demonstrated above. Not sure how this is different yet.
Oh drat....it is interpreting the port number as the instance.
The hardest code to debug in your own.
This is a symantics issue, and I've updated the README.md and the help text blurb in buckyd to be more specific. The valid formats you can specify a hashring member to buckyd is:
SERVER
SERVER:INSTANCE
SERVER:PORT:INSTANCE
You are doing SERVER:PORT
instead and buckyd was interpreting the port as the instance value. So executing like this should fix the matter for you:
buckyd --node test01 --prefix=/var/lib/graphite/whisper test01 test02
That means everyone runnning on non-standard ports using no instance names is running into this. I'll try to move to instance-based naming. Thanks for investigating.
It sounds like you are still having problems?
The port information isn't even used here. I believe that was just an effort to keep configuration strings the same here as used in carbon-relay.py
's DESTINATIONS
variable. Rather than yet another mutation of how to represent server/instance/port.
We're interested in building a hash ring here which doesn't consider the port. Its also a bit ambiguous what port should be used at this point. (The bucky port, or one the carbon line protos?)
I'm able to use bucky this way (leaving out the port info). It's just a little confusing that we now have 3 different notations to get matching hashrings:
bucky - server carbon-relay-ng - server:port carbonate - server:port:None (which will be replaced by bucky if everything works out now)
If you know it everything works fine. Next time we'll use instance identifiers even if there is only one instance per node.
Which, of course, I was trying to avoid, and ended up stepping right into.
I am running into a similar issue with Buckytools (version 0.4.0) either reporting the cluster inconsistent or bucky locate
giving wrong results.
My carbon-c-relay config looks like this:
cluster cache_cluster
fnv1a_ch
172.22.6.46:2103
172.22.6.46:2203
172.22.6.47:2103
172.22.6.47:2203
172.22.6.48:2103
172.22.6.48:2203
172.22.6.49:2103
172.22.6.49:2203
172.22.6.50:2103
172.22.6.50:2203
;
Buckyd is started like
/usr/bin/buckyd \
-node 172.22.6.46 \
-hash fnv1a \
-prefix /data/graphite/whisper \
172.22.6.46:2103=None 172.22.6.46:2203=None 172.22.6.47:2103=None 172.22.6.47:2203=None 172.22.6.48:2103=None 172.22.6.48:2203=None 172.22.6.49:2103=None 172.22.6.49:2203=None 172.22.6.50:2103=None 172.22.6.50:2203=None
I tried to run buckyd with every combination of 10x HOST[:PORT][=INSTANCE] where instance is either None or a and b. In any case, bucky servers
reports that cluster is unhealthy and inconsistent. The only way to get a possible result (=cluster is healthy) of bucky servers
is by defining just the 5 unique IP addresses but then the bucky locate
result is wrong.
I'm trying to migrate away from carbonate to buckytools, but both do not agree about placement of metrics in our existing cluster. Is there any known incompatibility or pitfall when switching from carbonate to buckytools?
Example carbonate.conf
carbon-lookup yields:
buckyd startet with:
bucky locate yields: