Open saveriocastellano opened 4 years ago
Maybe we need add_slave
and del_slave
command.
Or just a reset_meta
command so it will rediscover slave(s) status, equal to deleting meta/ directory?
I'm adding this method in "SSDBServer". I have been looking at the code and it seems to be the correct thing to do:
void SSDBServer::resetSync() {
log_info("resetting sync state...");
delete backend_sync;
backend_sync = new BackendSync(this->ssdb, this->sync_speed);
std::vector<Slave *>::iterator it;
for(it = slaves.begin(); it != slaves.end(); it++){
Slave *slave = *it;
slave->stop();
slave->last_seq = 0;
slave->last_key = "";
slave->save_status();
slave->start();
}
log_info("sync state reset");
}
Then in "proc_sys.cpp" I added this command which is bound to a new "syncreset" command I defined:
int proc_syncreset(NetworkServer *net, Link *link, const Request &req, Response *resp) {
SSDBServer *serv = (SSDBServer *)net->data;
CHECK_NUM_PARAMS(0);
serv->resetSync();
return 0;
}
I'm now going to test it and I will write here how it goes.
@ideawu if you see anything wrong in my code or you think it's just not the right way then it'd help me to know it. Thanks
stop()
, so you should delete the old one and create a new one. Take a look at https://github.com/ideawu/ssdb/blob/0e93e2aaa018abd2332478002f0664098049b23a/src/proc_sys.cpp#L261alright, so as you said in your previous post this is just a matter of adding a "del_slave" command, for "add_slave" there is no need to add a new command right, I can just use the existing "slaveof" command, correct?
No, slaveof command does not support mirror
replication.
but then I don't understand why you say:
"Execute del_slave and then add_slave operation on both A and B"
If I'm going to shutdown B and delete its "meta" and "data" directories, why do I need to "delete" and "add" the slave also on B? Node B will already have NodeA defined in its configuration file so it will connect to it when it starts.
If you shutdown B and delete its meta and data folder, then you don't need to invoke delete and add slave command on it.
thanks! That's what I thought.
I have implemented the command and tested it. So far it seems to work well. Here is my code, please tell me what you think.
In Serv.cpp I added this:
int SSDBServer::resetslave(const std::string &id) {
Slave *slave = NULL;
std::vector<Slave *>::iterator it;
for(it = slaves.begin(); it != slaves.end(); it++){
if ((*it)->get_id()==id) {
slave = *it;
slaves.erase(it);
break;
}
}
if (slave) {
log_info("resetting slave...");
delete slave;
this->slaveof(slave->get_id(), slave->get_host(), slave->get_port(), std::string("")/*auth*/, 0/*last_seq*/, std::string("")/*last_key*/, slave->get_is_mirror(), 0);
slave->start();
slaves.push_back(slave);
log_info("slave reset.");
return 0;
} else {
return -1;
}
}
In proc_sys.cpp I added this:
int proc_resetslave(NetworkServer *net, Link *link, const Request &req, Response *resp) {
SSDBServer *serv = (SSDBServer *)net->data;
CHECK_NUM_PARAMS(1);
std::string id = req[1].String();
int res = serv->resetslave(id);
if (res<0) {
resp->push_back("not_found");
} else {
resp->push_back("ok");
}
return 0;
}
The above code seems to work, however in the logs of MasterA (the master on which I have executed the new "resetslave" command) I get this:
strangely after restarting MasterB I see that the two nodes are in sync and writing to MasterA does reflect in the same data to be available in MasterB.
Do you know what could be the reason for that error i get in the log ?
Oh ...I think it's because I forgot to call "slave->stop();" to stop the thread of the slave!
Here is the updated code:
int SSDBServer::resetslave(const std::string &id) {
Slave *slave = NULL;
std::vector<Slave *>::iterator it;
for(it = slaves.begin(); it != slaves.end(); it++){
if ((*it)->get_id()==id) {
slave = *it;
slaves.erase(it);
break;
}
}
if (slave) {
log_info("resetting slave...");
Slave *newSlave = new Slave(ssdb, meta, slave->get_host().c_str(), slave->get_port(), slave->get_is_mirror());
slave->stop();
delete slave;
newSlave->save_status();
newSlave->start();
slaves.push_back(newSlave);
log_info("slave reset.");
return 0;
} else {
return -1;
}
}
hi all,
I managed to get this working by using my intial method which consists in deleting and restarting "BackendSync".
I have been using it in production for the past month and it is working very well: no need to restart nodes when they go OUT_OF_SYNC now
Here is my code: https://github.com/saveriocastellano/ssdb
Very useful idea 👍
If someone is interested in this usefull function I've patched the code a bit to be able to run with the latest codechanges in the master branch: https://github.com/rhessing/ssdb
Still testing it, but a standalone version does work without issues when running the ssdb benchmark.
@saveriocastellano please let me know if you would like to have a pull request :-)
@rhessing very good, thanks for letting me know
@ideawu I have two MasterA-MasterB SSDBs with sync=mirror and in case they go OUT_OF_SYNC I'd like to ask you if there is any possiblity to avoid shutting down at least one of the nodes. Because I'm only actively using MasterA, then when MasterB goes out of sync from it, instead of having to shutdown both instances (and removing 'data' and 'meta' in MasterB and 'meta' in MasterA) I wounder if it'd be possible to do the following instead:
a) shutdown ONLY MasterB and delete its "meta" and "data" directories b) write specific keys in the "meta" of MasterA to reset status if MasterB, so that MasterA will accept MasterB to start syncing from scratch
regarding b) I had a look at slave.cpp and I see that the status if stored in "meta" under this keys:
name of hash key = "slave.status." + this->id_; values: "last_key", "last_seq"
So I was wondering whether by writing the right key and setting "last_seq" to zero and "last_key" to nothing would let MasterA accepting MasterB to start syncing from scratch without having to restart MasterA.