Can I use buckytools to rebalance a fnv1a_ch cluster?

mwtzzz-zz commented 7 years ago

The developers at carbon-c-relay mentioned that I could use this to rebalance a fnv1a_ch hashring. But when i run buckyd, I get the following message: [root@ec2-xxx radar123 bin]$ setuidgid uuu ./buckyd -node ec2-xxx.compute-1.amazonaws.com -hash fnv1a_ch 2017/06/25 22:08:54 Invalide hash type. Supported types: [carbon jump_fnv1a]

Does buckytools support this type of hash? If not, do you know of how I can rebalance my cluster upon adding a new cache host?

jjneely commented 7 years ago

I presently support the carbon_ch and jump_fnv1a hashing algorithms in carbon-c-relay speak. Normally, yeah, this would definitely be the tool to use but I don't have that hash type implemented.

However, it probably wouldn't take much code if you are interested. I've already got the fnv1a hashing function coded in and working as part of the jump hash. You'd just need to implement a hashing.HashRing interface and plugging it into the buckyd and bucky commands should be fairly straight forward.

Glad to give you a hand to get some patches.

mwtzzz-zz commented 7 years ago

I'm definitely interested. I've got a 12 node cache cluster that's kind of hitting the ceiling in terms of of a throughput bottleneck. I need to add a 13th node, but I can't do it unless I can rebalance the cluster.

Do you need anything from me in terms of putting together a patch?

mwtzzz-zz commented 7 years ago

I'd like to gently encourage some progress on this .... We just got notice that one of the instances in our relay cluster is scheduled for termination in the next week. We've got to get the data off of there and onto a new node. Hopefully I can do this with buckytools if its ready for that time. Otherwise I'm going to have to use rsync or some similar brute force method and cut over to the new host when most or all of that data has been transfered.

grobian commented 7 years ago

I wasn't aware of this bug, I did a try in PR #18

mwtzzz-zz commented 7 years ago

I'll test this PR ...

mwtzzz-zz commented 7 years ago

@grobian your patch didn't work for me. When I run something like bucky list it fails to retrieve anything.

jjneely commented 7 years ago

Sorry for the delay here....I should be able to spend some time here this coming week. Although, I know that cuts it close for that EC2 termination.

mwtzzz-zz commented 7 years ago

@jjneely I already completed the migration, but I definitely still need buckytools to support fnv1a_ch. For two reasons: (a) there's duplicate metrics spread around the cluster which need to be consolidated, and (b) if we ever need to scale horizontally I need to be able to add more hosts and rebalance the cluster.

grobian commented 7 years ago

@mwtzzz can you explain exactly how you setup bucky? What I did to test this in a very simple manner.

on the server:

buckyd -node <graphite1> -prefix <path/to/carbon/whisper> -hash fnv1a -b :5678 <graphite1>

then from the client

env BUCKYHOST="<graphite1>:5678" bucky du -r '<tld>'

that returned on the client something like:

2017/08/11 15:41:15 Results from nut:5678 not available. Sleeping.
2017/08/11 15:41:15 Results from nut:5678 not available. Sleeping.
2017/08/11 15:41:16 nut:5678 returned 350 metrics
2017/08/11 15:41:16 Progress: 100/350 28.57%
2017/08/11 15:41:16 Progress: 200/350 57.14%
2017/08/11 15:41:16 Progress: 300/350 85.71%
2017/08/11 15:41:16 Du operation complete.
2017/08/11 15:41:16 912254000 Bytes
2017/08/11 15:41:16 869.99 MiB
2017/08/11 15:41:16 0.85 GiB

Does something like this work for you at all? I admit I don't fully understand the hostnames and how they are used by bucky, but it looks as if buckyd tells bucky where to connect to, so ensure buckyd has a correct list of hostnames for the hash-ring hosts.

jjneely commented 7 years ago

FNV1a support is now merged in as 0.4.0. Bug reports appreciated. Also note the change in how hashrings are specified by a list of SERVER[:PORT][=INSTANCE] strings.

mwtzzz-zz commented 6 years ago

Ah! ... Just getting around to seeing this (I got pulled away on other stuff at work)... Sorry I missed this earlier. Let me take a look at it today.

mwtzzz-zz commented 6 years ago

-bash-4.2$ ./bucky servers
Buckd daemons are using port: 4242
Hashing algorithm: [fnv1a: 12 nodes, 100 replicas, 1200 ring members radar-be-a:1905=a radar-be-b:1905=b ...]
Number of replicas: 100
Found these servers:
        radar-be-a
        radar-be-b
        ...
Is cluster healthy: false
2017/10/03 15:32:17 Cluster is inconsistent.

If I run any other command (list, inconsistent, etc), the following appears:

-bash-4.2$ ./bucky inconsistent
2017/10/03 15:34:35 Warning: Cluster is not healthy!
2017/10/03 15:34:35 Results from radar-be-c:4242 not available. Sleeping.
2017/10/03 15:34:35 Results from radar-be-k:4242 not available. Sleeping.

mwtzzz-zz commented 6 years ago

I see file descriptor 5 for buckyd process is iterating through the whisper files.... Perhaps it just takes time before buckyd has results ready?

on a different note, the bucky du -r command is working:

-bash-4.2$ ./bucky du -r '^niseidx\.adgroups\.multiplier\.1856660\.'
2017/10/03 15:53:11 radar-be-c:4242 returned 2 metrics
2017/10/03 15:53:12 radar-be-b:4242 returned 1 metrics
2017/10/03 15:53:12 radar-be-d:4242 returned 0 metrics
2017/10/03 15:53:12 radar-be-k:4242 returned 0 metrics
2017/10/03 15:53:12 radar-be-l:4242 returned 1 metrics
2017/10/03 15:53:13 radar-be-f:4242 returned 1 metrics
2017/10/03 15:53:13 radar-be-e:4242 returned 0 metrics
2017/10/03 15:53:13 radar-be-h:4242 returned 1 metrics
2017/10/03 15:53:13 radar-be-a:4242 returned 2 metrics
2017/10/03 15:53:13 radar-be-i:4242 returned 1 metrics
2017/10/03 15:53:13 radar-be-g:4242 returned 1 metrics
2017/10/03 15:53:13 radar-be-j:4242 returned 2 metrics
2017/10/03 15:53:13 Du operation complete.
2017/10/03 15:53:13 645600 Bytes
2017/10/03 15:53:13 0.62 MiB
2017/10/03 15:53:13 0.00 GiB

jjneely commented 6 years ago

If you are asking about the sleeping bits, yes, it takes a bit for buckyd to build a cache. The bucky CLI will wait for them.

deniszh commented 6 years ago

@mwtzzz : could you please share your relay config and buckyd command line options?

mwtzzz-zz commented 6 years ago

-bash-4.2$ ./bucky inconsistent -v -h radar-be-c:4242 -s
2017/10/03 16:06:01 Warning: Cluster is not healthy!
2017/10/03 16:06:54 radar-be-c.mgmt.xad.com:4242 returned 12102076 metrics
Killed

This Killed is occuring automatically after a minute or so.

mwtzzz-zz commented 6 years ago

@deniszh My relay config looks like this with 12 nodes:

cluster radar-be
  fnv1a_ch
    radar-be-a:1905=a
    radar-be-b:1905=b
    ...
  ;

My buckyd command line (which I am running identically on each of the 12 hosts) looks like this (notice I'm not using -n):

/usr/local/src/buckyd -hash fnv1a -p /media/ephemeral0/carbon/storage/whisper radar-be-a:1905=a radar-be-b:1905=b ... <12 hosts total>

deniszh commented 6 years ago

Why not using -n?

mwtzzz-zz commented 6 years ago

@deniszh mostly because it's more work for me to include it and I thought it was optional (?). But if it's necessary, I'll definitely include it.... Should I put it in?

mwtzzz-zz commented 6 years ago

ah my bad. just noticed the documentation, -n defaults to whatever hostname -I says, which is not what I want .... I'll restart all the daemons with the right -n

mwtzzz-zz commented 6 years ago

ok, this is looking much better. bucky servers now shows a healthy cluster.... I'll keep playing with the commands, I'm going to see if there's any inconsistencies and try to fix them. Assuming it's working, then I will add a 13th node to the cluster and do a rebalance.

azhiltsov commented 6 years ago

Be aware of #19 @mwtzzz

mwtzzz-zz commented 6 years ago

@azhiltsov I'm assuming the rebalance would make use of bucky-fill at some point, which could possibly corrupt some of my archive sums? .... It looks like Civil made a PR with his fix, I might just go ahead and merge that into my local copy.

mwtzzz-zz commented 6 years ago

I'm having an issue running bucky inconsistent. After several minutes it shows "Killed":

-bash-4.2$ ./bucky inconsistent -h radar-be-c:4242 
Killed

The output from the command shows a bunch of "Results from radar-be-x not available. Sleeping", then shows a metrics count for only six of the twelve nodes:

2017/10/04 07:17:42 radar-be-c:4242 returned 12139223 metrics
2017/10/04 07:21:34 radar-be-b:4242 returned 14213010 metrics
2017/10/04 07:23:49 radar-be-k:4242 returned 15511087 metrics
2017/10/04 07:23:54 radar-be-l:4242 returned 14627005 metrics
2017/10/04 07:23:57 radar-be-e:4242 returned 15568388 metrics
2017/10/04 07:25:03 radar-be-d:4242 returned 14510385 metrics

The buckyd log file doesn't show much, some "get /metrics" and:

2017/10/04 06:55:19 172.17.33.75:59447 - - GET /metrics
2017/10/04 06:55:19 Scaning /media/ephemeral0/carbon/storage/whisper for metrics...
2017/10/04 07:16:02 172.17.33.75:18013 - - GET /metrics
2017/10/04 07:16:56 Scan complete.
2017/10/04 07:16:57 172.17.33.75:18259 - - GET /metrics

I ran it a second time. This time only 4 of the nodes returned metrics before "Killed."

deniszh commented 6 years ago

Check your syslog. It looks like OOM killer, so bucky consumes too much memory, which is totally possible for 12-15 mln metrics x 12 nodes....

jjneely commented 6 years ago

Was about to write the same thing. The client you are running the bucky CLI on doesn't have enough memory.

mwtzzz-zz commented 6 years ago

Ah, good suggestion. Indeed it was oom-killer that nuked it. It looks like bucky was consuming 20G+ of RAM on this 64GB system.

increasing the RAM on these instances is not an option. Do you have any suggestions on how I can get it to work? Does bucky really need 20G+ of RAM?

deniszh commented 6 years ago

Spawn another instance with enough ram? It should host bucky only, not buckyd nor graphite.

mwtzzz-zz commented 6 years ago

Ok, If I need to I'll do that. But why does it need to put everything in RAM? Wouldn't it be better to have an option to "rate limit" the RAM usage and put stuff temporarily on disk as needed?

mwtzzz-zz commented 6 years ago

I got the RAM issue sorted out and successfully running bucky inconsistent.

question: what exactly does the bucky rebalance --delete do? Does it delete the metric data or does it delete the original whisper file after the copy has been made on the new server? If the latter, then there's a problem, because the metrics that are being moved are losing their historical data. I'm running ./bucky rebalance --delete -w 10 -h radar-be-a:4242, logs look like this:

2017/10/04 12:55:16 DELETED: atlantic_direct.campaign.117535.supply.analyze46display411ws2.adCount.sum_all.hosts

and this metric no longer has any of the data it used to have.

jjneely commented 6 years ago

How much RAM is bucky (the CLI client) using for you? I can at least compare it to mine. There are definitely techniques we can use to reduce the memory requirements, but usually finding more RAM is faster.

The idea for the rebalance is to, more or less, atomically move a metric file/data from one location to another. So we copy the data, write to disk, sanity check, and finally delete (when --delete is used) the source. If a metric already exists at the target the whisper-fill algorithm is used to merge the data.

For these cases I'd grab the new target metric file and figure out what's wrong with it. Clearly, this is what the tool is designed to protect against. Is the whisper file corrupt or have a incorrect file size? Also, the logs for buckyd are fairly verbose, it may have some interesting details.

Without --delete rebalance does the same operation, the the deletion of the source whisper files/locations are skipped.

There is bug #19 which I need to spend some time with (and work isn't letting me...go figure...). In my usage most of my whisper data is 60 second resolution for 25 months and I don't have extra rollup archives -- this may be why I've not noticed it in the past.

mwtzzz-zz commented 6 years ago

bucky cli is using about 360MB of RAM initially (first 20-30 minutes, during this time I'm assuming it's not doing much, waiting for the buckyd servers to scan their filesystems), then it jumps to 20GB, then once it starts hashing it jumps to 71GB on my system. Here what pmap shows: total kB 71519320 54618572 47270884. Note that it doesn't reach that right away. The first 20-30 minutes it uses about 360MB, then it jumps up.

Let's look at one of the metrics from yesterdays' rebalance. The metric name is atlantic_exchange/usersync/cookiepartner/TAPAD/syncs/sum_all/hosts.wsp. There are two hosts that this appears on (should only be one):

One file has a current timestamp - this is the one that's being actively written to and this is the one that's missing data The only data present here is the new data that has arrived after yesterday's rebalance.
The other has a timestamp from yesterday at the rebalance. This is the old file that has all of the historical data and is no longer being written to. Using whisper-diff on these two files:
```
Archive 0 (1437 of 1437 datapoints differ)
Archive 1 (2592 of 2592 datapoints differ)
Archive 2 (720 of 720 datapoints differ)
Archive 3 (247 of 247 datapoints differ)
```
I think I know what's going on with these two files. I just ran a bucky inconsistent and it says: radar-be-i:4242: atlantic_exchange.usersync.cookiepartner.TAPAD.syncs.sum_all. It thinks radar-be-i is the wrong host for this metric. But this is the host that carbon-relay is currently actively writing this metric to. So i think what happened during the rebalance, is that bucky selected a different host, successfully moved the file over, but carbon-simply continued writing to the old host. Hence the missing metrics (they were successfully moved to a non-active location.)

So I think the problem is: bucky's idea of where the metrics should be is different than carbon-relay's idea.

EDIT: I don't know if it's relevant to the discussion, but bucky servers is reporting number of replicas as 100. I don't know where it's getting that from - it should be "1". We're using fnv1a_ch with a rep factor of 1.

Hashing algorithm: [fnv1a: 12 nodes, 100 replicas, 1200 ring members 
Number of replicas: 100

grobian commented 6 years ago

The 100 replicas is an internal implementation detail, carbon-c-relay does the same thing.

The misplacement of metrics is possible. Does bucky report atlantic_exchange.usersync.cookiepartner.TAPAD.syncs.sum_all (with the missing .hosts) for real or was this a copy/paste error?

mwtzzz-zz commented 6 years ago

copy/paste error. the full paste is: bucky inconsistent reports: radar-be-i:4242: atlantic_exchange.usersync.cookiepartner.TAPAD.syncs.sum_all.hosts

radar-be-i: carbon-relay is writing here
radar-be-f: bucky thinks this is where it should go

(note that "radar-be-f" is the shortened hostname. I remove the domain before posting.)

mwtzzz-zz commented 6 years ago

bucky rebalance --no-op shows this:

2017/10/05 13:49:18 [radar-be-i:4242] atlantic_exchange.usersync.cookiepartner.TAPAD.syncs.sum_all.hosts => radar-be-f

mwtzzz-zz commented 6 years ago

Are there any tools we can use to see what carbon-relay is doing to calculate the hashring node, to see what bucky is doing, and to find out why they're giving different results? I see buckytools has a fnv1a_test.go but it seems to be missing a module.

grobian commented 6 years ago

Yes, if you launch your carbon-c-relay with the -t (test) flag, it will prompt for data input and show you how it would route the input (assuming it is a metric string). So, in your case, just start the relay with -t -f <conffile> and paste atlantic_exchange.usersync.cookiepartner.TAPAD.syncs.sum_all.hosts. That will show you where the relay thinks it should be routed to. If this is different, then we need to examine what's going on in this case.

mwtzzz-zz commented 6 years ago

@Thanks for letting me know about the test flag. Very useful. Now I know what's going on. First of all, it's not bucky's fault. Bucky is computing the correct placement of metric based on the information it's given.

The problem is that we are rewriting the metric name on the backend right before it goes to the local cache on that node. Our front end relay is sending the metric with "agg" prepended to the name. The backend relay receives this metric and then removes "agg" before writing it to its local cache. Bucky doesn't know about this rewrite, so it thinks the metric is on the wrong node. Technically it is on the wrong node given the metric name. But it is on the right node if the name has "agg" prepended to it.

So my problem is: how to rebalance this cluster, placing metrics whose name contains _xxxx.sumall.hosts into the node where they would go if the name contained _agg.xxxx.sumall.hosts. Any thoughts?

Here are the details: atlantic_exchange.usersync.cookiepartner.TAPAD.syncs.sum.yyy arrives at front end carbon-c-relay. an aggregate rule sums it and rewrites the metric name as agg.*.sum_all.hosts. this metric is then passed on to the back end relay. As you can see this is passed to the radar-be-i node:

agg.atlantic_exchange.usersync.cookiepartner.TAPAD.syncs.sum_all.hosts
match
    ^agg\. [strncmp: agg.]
    -> agg.atlantic_exchange.usersync.cookiepartner.TAPAD.syncs.sum_all.hosts
    fnv1a_ch(radar-be)
        radar-be-i:1905

The metric arrives at radar-be-i node where it is summed again and then "agg" is stripped from the metric name and then it is written to local whisper file as atlantic_exchange.usersync.cookiepartner.TAPAD.syncs.sum_all.hosts:

agg.atlantic_exchange.usersync.cookiepartner.TAPAD.syncs.sum_all.hosts
aggregation
    ^(us-east-1\.)?agg\.(.+)$ (regex) -> agg.atlantic_exchange.usersync.cookiepartner.TAPAD.syncs.sum_all.hosts
    sum(\2) -> atlantic_exchange.usersync.cookiepartner.TAPAD.syncs.sum_all.hosts 
atlantic_exchange.usersync.cookiepartner.TAPAD.syncs.sum_all.hosts
match
    * -> atlantic_exchange.usersync.cookiepartner.TAPAD.syncs.sum_all.hosts
    fnv1a_ch(cache)
        127.0.0.1:2107
    stop

The culprit here is the following rules in the relay conf file:

aggregate
    ^(us-east-1\.)?agg\.(.+)$
  compute sum write to \2
  ;
match ^agg\. send to blackhole stop;
match * send to cache stop;

It rewrites the name to \2 and then sends it to local cache. I suppose right before the last last I could insert a new rule that sends anything with "sum_all.hosts" back to the relay so that it gets routed to the correct host according to the hash. This is the only thing I can think of unless bucky has (or could have?) a way to balance a cluster based on some rewrite rules.

deniszh commented 6 years ago

It rewrites the name to \2 and then sends it to local cache. I suppose right before the last last I could insert a new rule that sends anything with "sum_all.hosts" back to the relay so that it gets routed to the correct host according to the hash. This is the only thing I can think of unless bucky has (or could have?) a way to balance a cluster based on some rewrite rules.

Indeed. You should send new metric back to relay and not to local cache. Or you can have separate whisper storage for local metrics, but that's quite ugly IMO.

mwtzzz-zz commented 6 years ago

the good news is, bucky is doing things correctly. I'm looking forward to being able to add more nodes to the cluster and using it to rebalance.

mwtzzz-zz commented 6 years ago

in testing this out, I came across a new unrelated issue. My carbon-cache instances write their own metrics to /media/ephemeral0/carbon/storage/whisper/carbon.radar702 as per a directive I set in the carbon-cache config file: CARBON_METRIC_PREFIX = carbon.radar702 carbon-cache appears to write its own metrics directly to disk bypassing the relay (correct me if I'm wrong). Unfortunately, bucky looks at it and thinks it belongs on a different node:

2017/10/06 16:28:39 [radar-be-a:4242] carbon.radar702.agents.ip-172-22-17-20-2105.errors => radar-be-b

Is there a way to deal with this?

jjneely commented 6 years ago

You are correct about carbon-cache.py. It writes its own metrics directly to disk and they cannot go through the relay. Usually, these are prefixed with carbon.agents. and bucky has an exception for them in the rebalance code.

mwtzzz-zz commented 6 years ago

made the change from carbon.radar702 to carbon.agents.radar702 and it works perfectly.

excellent tool. I used it on our small qa cluster and it rebalanced 45,000 out of 320,000 metrics in a matter of a couple seconds.

jjneely commented 6 years ago

Okay, so what issues remain here? The rebalance and corruption?

mwtzzz-zz commented 6 years ago

No issues remaining. It seems to be working correctly. Thanks for your help on this, much appreciated!

mwtzzz-zz commented 6 years ago

@jjneely I've noticed a new issue. I completed a rebalance on our main production cluster. Everything great, except there are a handful (about 700) metrics that the relays are putting on node "radar-be-k" while bucky thinks they should be on node "radar-be-i". The curious thing is that this is only happening on the one node. The other eleven nodes don't have this discrepansy.

I ran some of the metric names through carbon-c-relay -t -f on both the front end and backend relays for testing, and they always hash them to radar-be-k. So the relay is putting it in the spot it thinks it should go on.

in this case, it seems bucky is incorrect about the placement.

grobian commented 6 years ago

We'd need the exact metric name, so we can debug the hash on both c-relay and bucky.

mwtzzz-zz commented 6 years ago

That's what I figured. The metric names include our ec2 instance hostnames. Can I private-message to you directly?

grobian commented 6 years ago

Yes of course. email is fine too.

mwtzzz-zz commented 6 years ago

@grobian I just sent you an email from my gmail account.

jjneely / buckytools

Can I use buckytools to rebalance a fnv1a_ch cluster? #17