ideawu / ssdb

SSDB - A fast NoSQL database, an alternative to Redis
http://ssdb.io/
BSD 3-Clause "New" or "Revised" License
8.2k stars 1.4k forks source link

replication and HA #1289

Open saveriocastellano opened 5 years ago

saveriocastellano commented 5 years ago

Hello, I'm trying to understand better how replication in SSDB works.

Basically I have a SSDB cluster (made of several master nodes) and I have implemented sharding at the SSDB client side (so data is automatically sharded accross many ssdb instances with a simple hash key / mod algorithm). Now my last goal is to make every master node fully redundant by coupling it to a slave. So I would like to have a slave for each master and I would like slave and master to be a synced as possible so that if there is a problem with one of the masters I can switch to its slave.

I have the following questions:

1) when I use "slaveof" to connect the slave to a master, what is the difference between "sync" and "mirror" type? Can you explain ?

2) I have been using a slave for each master and I came accross the following two problems:

a) after 2-3 days of normal operation (with data copied from master to slave regularly) the master becomes not responsive (commands take long to complete) and in the log I see the same "broken pipe" sync error that is reported here:

 https://github.com/ideawu/ssdb/issues/1105

b) if what described in a) doesn't happen, then it happens that the master after a few days of operation become very slow to response

I'm pretty sure that both problems a) and b) are because of master/slave syncronitation, because if I shutdown the slave and I leave it off then everything works good all the time. I'm using slaves configured with type=sync, could my problems be related to this? Shall I try setting type=mirror ?

3) is it possible to add a slave to a master even after the master is been active for some time and has written a lot of data (> 1GB)... I mean, will the slave still be able to sync? Will the sync cause performance issue on the master or impact the operation of the master at all ?

4) in your document page where you described HA (http://ssdb.io/docs/ha.html) you wrote the following:

Your applications should only invoke write requests to the Master node, and read request to the Master or Slave node(s).

does this mean that the application should ONLY write to the master and should NOT read from the master? Is it OK to read/write to the master always? This is what I'm currently doing and I hope it is ok.

5) when a master is down, I can switch to the slave and promote it as master... should I actually change its configuration so that it is no longer "slaveof" the master that failed? When the failed master comes up again, shall I add "slaveof" to it and let it point to the slave which has now become the master? Or shall I never touch the configuration, and when the master is up I shall just restart writing to it instead of writing to the slave?

ideawu commented 5 years ago

when I use "slaveof" to connect the slave to a master, what is the difference between "sync" and "mirror" type? Can you explain ?

sync is for master-slave replication, mirror is for master-master replication.

is it possible to add a slave to a master even after the master is been active for some time and has written a lot of data (> 1GB)...

A new slave can be added to connect to a master at any time, so it is ok when the master has some or a lot of data.

Is it OK to read/write to the master always?

Yes, it is ok to read/write to the master always.

when a master is down, I can switch to the slave and promote it as master... should I actually change its configuration so that it is no longer "slaveof" the master that failed?

Promote a Salve - In this case, you must disable 'slaveof' configuration of the slave(which has been promoted as master), that will require a restart of the process after ssdb.conf been updated.

When the failed master comes up again, shall I add "slaveof" to it and let it point to the slave which has now become the master?

Erase a Master - After one of the slaves is promoted as master, the previously failed master must be deleted(remove data and meta folder, rm -rf data meta) before started again for any purpose. After removing all data, it become a fully empty node, that can be made as a slave.

Or shall I never touch the configuration, and when the master is up I shall just restart writing to it instead of writing to the slave?

If a failed master can be restarted, I would recommend restart it.

saveriocastellano commented 5 years ago

hello,

first of all, thank you very much for your replies, which are already quite helpful.

Now I have one doubt, to scale a ssdb setup I seem to have two options:

1) shard data accross multiple masters (using at the ssdb client level a key hash algorith to split data accross the masters) and for each master have a slave (slavof/type=sync)

2) have many mirrored masters... in this case the ssdb client will write/read to one of the available masters each time, and masters will sync by using slaveof/type=sync.

which one of the 2 options above do you think is best? I understand that with option 1) I will have the data sharded across the masters and if I used one slave per master then the data will be duplicated one time only. While instead, If I use option 2) the data will be duplicated N times if I have N masters.

Is there any difference in terms of performance and efficiency to sync data when using type sync or mirror?

What about the "broken pipe" error I reported (and other people reported as well) ? Do you know what can be the cause of it? Do you think it is something that does not happen or has a smaller chance of happening when type=mirror instead of sync? I'm asking this because I need to decide whether to switch to option 2 from option 1.

Thanks

saveriocastellano commented 5 years ago

hallo, could you please answer my latest questions here above?

Thanks