Open Fodoj opened 8 years ago
Hi sorry for the long response time. I've had some personal stuff recently. In theory if you could start literally 20 nodes at once that could theoretically result in a split brain in the configuration, yes. At the same time though I don't know of a way that would work in practice. As long as you have existing nodes in the cluster they will pick up the new members and add them and migrate, but that won't happen all at the same time just due to the way that puppet would not be able to run them all with the same timing.
Have you run into an issue specifically with this? I can also try to test something myself.
I tested myself and split brain happens in 99% of cases :(
On 4 Mar 2016 01:21 +0100, Justice Londonnotifications@github.com, wrote:
Hi sorry for the long response time. I've had some personal stuff recently. In theory if you could start literally 20 nodes at once that could theoretically result in a split brain in the configuration, yes. At the same time though I don't know of a way that would work in practice. As long as you have existing nodes in the cluster they will pick up the new members and add them and migrate, but that won't happen all at the same time just due to the way that puppet would not be able to run them all with the same timing.
Have you run into an issue specifically with this? I can also try to test something myself.
— Reply to this email directly orview it on GitHub(https://github.com/justicel/puppet-couchbase/issues/34#issuecomment-192033847).
Huh. Weird. I'll look into it some more.
I think this is the key point:
As long as you have existing nodes in the cluster
In the case of spawning a completely new cluster (not adding to an existing one with nodes already) with 20 new VMs this is very likely to happen as the VMs come up simultaneously.
@dfairhurst Fair enough. I'll work on engineering a solution for that particular problem.
Thoughts on waiting random T
seconds (i.e. sleep $(/usr/bin/shuf -i 1000-10000 -n 1)
) in the module before starting/joining the cluster?
Good idea! I'll consider how to best implement this that won't fall afoul of timeouts for exec, etc.
Well, assuming this actually works, what about adding it to the couchbasenode.erb
template such that subsequent entries would render as follows:
#!/bin/bash
touch /opt/couchbase/var/.installed
#Server node configurations below
/opt/couchbase/bin/couchbase-cli rebalance -c localhost -u couchbase -p 'password' --server-add=couchbase01.example.com --server-add-username=couchbase --server-add-password='password'
/usr/bin/sleep $(/usr/bin/shuf -i 500-10000 -n 1)
/opt/couchbase/bin/couchbase-cli rebalance -c localhost -u couchbase -p 'password' --server-add=couchbase02.example.com --server-add-username=couchbase --server-add-password='password'
/usr/bin/sleep $(/usr/bin/shuf -i 500-10000 -n 1)
Also worth noting:
DEPRECATED: Adding server from the rebalance command is deprecated and will be removed in future release, use the server-add command to add servers instead.
I was originally looking for a --wait
option like they have for bucket-create
but now I'm curious if server-add
behaves any differently in mitigating this same issue more natively.
If I would simultaneously start 20 nodes each applying this module with same cluster name, is there a chance that I will get split cluster issue? After going through source code it seems like nothing would stop coucbase from doing it.