question: migrate from old cluster to new

jjneely / buckytools

Go implementation of useful tools for dealing with Graphite's Whisper DBs and Carbon hashing

Other

87 stars 21 forks source link

question: migrate from old cluster to new #9

Closed howdoicomputer closed 7 years ago

howdoicomputer commented 7 years ago

@jjneely

I have some possibly stupid questions coming your way, Mr. Neely.

Right now, I'm in charge of maintaining a very large, overly provisioned, hand-grown, monster of a Graphite cluster.

I recently built out a brand new cluster. It's completely automated, easy to create, performs super well and I'm happy with it.

However, I'm trying to figure out a good migration strategy. What I was thinking was to switch out clusters at the DNS level and, while clients are sending their traffic to the new cluster, use `bucky rebalance --no-delete' to take metrics out of the old cluster and shove them into the new one.

Am I going down the right path here?

jdblack commented 7 years ago

I'm in the exact same situation and testing rebalance --no-delete as we speak. How did it work for you?

howdoicomputer commented 7 years ago

@jdblack I'll let you know when I try it! Haha. I mean, it should work. My automated cluster is going into production next Tuesday and then the old cluster will go into a read only mode and then I'll try draining metrics from each individual cache node from the old cluster with the --no-delete option.

jdblack commented 7 years ago

I tested tonight by doing partial syncs and wasnt convinced. I actually saw puts on both sides, leaving me to believe it may be teyibg to keep half the dataset on each side.

I tried tar option, but that seems to only allow metrics the commabd line (e.g. no -f or redirect foe a metric list)

On Feb 17, 2017 10:40 PM, "Tyler Hampton" notifications@github.com wrote:

@jdblack https://github.com/jdblack I'll let you know when I try it! Haha. I mean, it should work. My automated cluster is going into production next Tuesday and then the old cluster will go into a read only mode and then I'll try draining metrics from each individual cache node from the old cluster with the --no-delete option.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/jjneely/buckytools/issues/9#issuecomment-280826459, or mute the thread https://github.com/notifications/unsubscribe-auth/ABDuLQFlGo2k3zsjm5IFom6waf-Y-SjRks5rdpJBgaJpZM4L9ZgK .

jjneely commented 7 years ago

Well, the first point is that the code base, as it is today, doesn't have direct support for doing a migration from one cluster to another.

What I've done before is used carbon-c-relay to direct data into both clusters and then slowly attempt to backfill old data into the new cluster. But I'm coming around to this problem again as well, so I may add more direct support.

The "rebalance" command does give one the ability to specify a list of additional servers not in the current hashring. This was meant more for pulling all data off a spare node or a removed node and rebalancing it into the cluster. You could use that technique for the actual migration.

bucky rebalance -h new-cluster:1234 old-server-000:1234 old-server-001:1234 old-server-002:1234

ra-dft commented 7 years ago

@jjneely so do I understand this correctly that with the above example, bucky will pull data from the old servers via graphite-web's api or buckyd? I'm just trying to wrap my head around the requirements as I prepare for the same operation. However, we want to take advantage of feeding the old data back through our new carbon-c-relay's that are going to be setup using fnv1a_ch instead of carbon_ch.

deniszh commented 7 years ago

@ra-dft : bucky will pull data through buckyd, not graphite-web. But IIRC there's no support of fnv1a_ch hash in bucky, only jump_fnv1a_ch - and vice versa, graphite-web supports only fnv1a_ch but not jump...

jjneely commented 7 years ago

0.10.x has fnv1a_ch now? Lovely....

But yes, deniszh is correct.

jjneely commented 7 years ago

I think this is taken care of at this point as far as accessing data from a graphite storage node outside of the defined cluster ring. If not please re-open.