justicel / puppet-couchbase

Puppet couchbase module for auto-scaling of couchbase with puppet
9 stars 30 forks source link

Split brain issue #34

Open Fodoj opened 8 years ago

Fodoj commented 8 years ago

If I would simultaneously start 20 nodes each applying this module with same cluster name, is there a chance that I will get split cluster issue? After going through source code it seems like nothing would stop coucbase from doing it.

justicel commented 8 years ago

Hi sorry for the long response time. I've had some personal stuff recently. In theory if you could start literally 20 nodes at once that could theoretically result in a split brain in the configuration, yes. At the same time though I don't know of a way that would work in practice. As long as you have existing nodes in the cluster they will pick up the new members and add them and migrate, but that won't happen all at the same time just due to the way that puppet would not be able to run them all with the same timing.

Have you run into an issue specifically with this? I can also try to test something myself.

Fodoj commented 8 years ago

I tested myself and split brain happens in 99% of cases :(

On 4 Mar 2016 01:21 +0100, Justice Londonnotifications@github.com, wrote:

Hi sorry for the long response time. I've had some personal stuff recently. In theory if you could start literally 20 nodes at once that could theoretically result in a split brain in the configuration, yes. At the same time though I don't know of a way that would work in practice. As long as you have existing nodes in the cluster they will pick up the new members and add them and migrate, but that won't happen all at the same time just due to the way that puppet would not be able to run them all with the same timing.

Have you run into an issue specifically with this? I can also try to test something myself.

— Reply to this email directly orview it on GitHub(https://github.com/justicel/puppet-couchbase/issues/34#issuecomment-192033847).

justicel commented 8 years ago

Huh. Weird. I'll look into it some more.

dfairhurst commented 8 years ago

I think this is the key point:

As long as you have existing nodes in the cluster

In the case of spawning a completely new cluster (not adding to an existing one with nodes already) with 20 new VMs this is very likely to happen as the VMs come up simultaneously.

justicel commented 8 years ago

@dfairhurst Fair enough. I'll work on engineering a solution for that particular problem.

rdev5 commented 7 years ago

Thoughts on waiting random T seconds (i.e. sleep $(/usr/bin/shuf -i 1000-10000 -n 1)) in the module before starting/joining the cluster?

justicel commented 7 years ago

Good idea! I'll consider how to best implement this that won't fall afoul of timeouts for exec, etc.

rdev5 commented 7 years ago

Well, assuming this actually works, what about adding it to the couchbasenode.erb template such that subsequent entries would render as follows:

#!/bin/bash

touch /opt/couchbase/var/.installed

#Server node configurations below
/opt/couchbase/bin/couchbase-cli rebalance -c localhost -u couchbase -p 'password' --server-add=couchbase01.example.com --server-add-username=couchbase --server-add-password='password'
/usr/bin/sleep $(/usr/bin/shuf -i 500-10000 -n 1)

/opt/couchbase/bin/couchbase-cli rebalance -c localhost -u couchbase -p 'password' --server-add=couchbase02.example.com --server-add-username=couchbase --server-add-password='password'
/usr/bin/sleep $(/usr/bin/shuf -i 500-10000 -n 1)
rdev5 commented 7 years ago

Also worth noting:

DEPRECATED: Adding server from the rebalance command is deprecated and will be removed in future release, use the server-add command to add servers instead.

I was originally looking for a --wait option like they have for bucket-create but now I'm curious if server-add behaves any differently in mitigating this same issue more natively.