hectcastro / docker-riak

A Docker project to bring up a local Riak cluster.
https://registry.hub.docker.com/u/hectcastro/riak/
Apache License 2.0
167 stars 83 forks source link

Ring ownership after start #36

Open kakoni opened 9 years ago

kakoni commented 9 years ago

So starting riak cluster as per documention `DOCKER_RIAK_AUTOMATIC_CLUSTERING=1 DOCKER_RIAK_CLUSTER_SIZE=5 DOCKER_RIAK_BACKEND=leveldb make start-cluster``

After stabilization I'll check for ring ownership; "ring_ownership": "[{'riak@172.17.0.64',64}]",

Thats uncool.

So docker-enter into one of the nodes to see what riak-admin cluster plan shows

================================= Membership ==================================
Status     Ring    Pending    Node
-------------------------------------------------------------------------------
valid      20.3%     20.3%    'riak@172.17.0.64'
valid      20.3%     20.3%    'riak@172.17.0.65'
valid      20.3%     20.3%    'riak@172.17.0.66'
valid      20.3%     20.3%    'riak@172.17.0.67'
valid      18.8%     18.8%    'riak@172.17.0.68'
-------------------------------------------------------------------------------
Valid:5 / Leaving:0 / Exiting:0 / Joining:0 / Down:0

Outstading changes, I'll commit them and check the status again

root@5ec57bbf8295:~# riak-admin cluster commit
Cluster changes committed
root@5ec57bbf8295:~# riak-admin cluster plan
There are no staged changes

All good. And obviously ring ownership also cool now "ring_ownership": "[{'riak@172.17.0.64',13},\n {'riak@172.17.0.65',13},\n {'riak@172.17.0.66',13},\n {'riak@172.17.0.67',13},\n {'riak@172.17.0.68',12}]",

So something with automatic_clustering fails here.

kazarena commented 9 years ago

@kakoni, I had similar issues with unfinished cluster configuration. After digging into it I figured out that sometimes automatic_clustering.sh is executed too soon and 'cluster join' command returns "Node not found!" message. I wasn't able to come up with a quick fix for this issue and chose an alternative option: instead of joining the cluster from inside the container I'm doing it explicitly in start-cluster.sh, see bdb49dd14746b08c27ef3993ee7645f5c3b73d72 and 83ff81f5c5448218b2bbc59d4ba30978bdbc732a.

These changes give me consistently stable behaviour.

Hope this helps.