imzhenyu / rDSN

Robust Distributed System Nucleus (rDSN) is an open framework for quickly building and managing high performance and robust distributed systems.
MIT License
32 stars 12 forks source link

potential secondary learn too slow to catch up with primary #528

Open qinzuoyan opened 8 years ago

qinzuoyan commented 8 years ago

In our test, we found that the potential secondary learn too slow:

log.201.txt:D12:14:41.839 (1470744881839169361 bc6c) replica5.replica_long3.080400030000001d: replica.learn:843:on_copy_remote_state_completed(): 1.4@10.108.187.19:34805: on_copy_remote_state_completed[0000025200000002]: learnee = 10.108.187.19:34803, learn_duration = 3754 ms, apply checkpoint/log done, err = ERR_OK, app_committed_decree = (894646 => 897430), app_durable_decree = (869234 => 869234), local_committed_decree = 894646, remote_committed_decree = 897431, prepare_start_decree = -1, current_learning_status = replication::learner_status::LearningWithoutPrepare
log.201.txt:D12:14:44.532 (1470744884532517850 bc6a) replica5.replica_long1.0804000300000372: replica.learn:843:on_copy_remote_state_completed(): 1.4@10.108.187.19:34805: on_copy_remote_state_completed[0000025200000002]: learnee = 10.108.187.19:34803, learn_duration = 6448 ms, apply checkpoint/log done, err = ERR_OK, app_committed_decree = (897430 => 897690), app_durable_decree = (869234 => 869234), local_committed_decree = 894646, remote_committed_decree = 897691, prepare_start_decree = -1, current_learning_status = replication::learner_status::LearningWithoutPrepare
log.201.txt:D12:14:47.068 (1470744887068260243 bc69) replica5.replica_long0.080400030000057d: replica.learn:843:on_copy_remote_state_completed(): 1.4@10.108.187.19:34805: on_copy_remote_state_completed[0000025200000002]: learnee = 10.108.187.19:34803, learn_duration = 8983 ms, apply checkpoint/log done, err = ERR_OK, app_committed_decree = (897690 => 897820), app_durable_decree = (869234 => 869234), local_committed_decree = 894646, remote_committed_decree = 897821, prepare_start_decree = -1, current_learning_status = replication::learner_status::LearningWithoutPrepare
...
log.211.txt:D12:38:55.960 (1470746335960825825 bc69) replica5.replica_long0.08040003000571ba: replica.learn:843:on_copy_remote_state_completed(): 1.4@10.108.187.19:34805: on_copy_remote_state_completed[0000025200000002]: learnee = 10.108.187.19:34803, learn_duration = 1457876 ms, apply checkpoint/log done, err = ERR_OK, app_committed_decree = (991185 => 991475), app_durable_decree = (869234 => 869234), local_committed_decree = 894646, remote_committed_decree = 991476, prepare_start_decree = -1, current_learning_status = replication::learner_status::LearningWithoutPrepare
log.211.txt:D12:39:00.286 (1470746340286794584 bc69) replica5.replica_long0.080400030005764f: replica.learn:843:on_copy_remote_state_completed(): 1.4@10.108.187.19:34805: on_copy_remote_state_completed[0000025200000002]: learnee = 10.108.187.19:34803, learn_duration = 1462202 ms, apply checkpoint/log done, err = ERR_OK, app_committed_decree = (991475 => 991780), app_durable_decree = (869234 => 869234), local_committed_decree = 894646, remote_committed_decree = 991781, prepare_start_decree = -1, current_learning_status = replication::learner_status::LearningWithoutPrepare
log.211.txt:D12:39:00.377 (1470746340377718938 bc69) replica5.replica_long0.0804000300057af4: replica.learn:843:on_copy_remote_state_completed(): 1.4@10.108.187.19:34805: on_copy_remote_state_completed[0000025200000002]: learnee = 10.108.187.19:34803, learn_duration = 1462293 ms, apply checkpoint/log done, err = ERR_OK, app_committed_decree = (991780 => 992088), app_durable_decree = (869234 => 869234), local_committed_decree = 894646, remote_committed_decree = 992089, prepare_start_decree = -1, current_learning_status = replication::learner_status::LearningWithoutPrepare

As you can see, the learning process lasts for tens of minutes.

That is:

The reason is:

Improvement suggestion:

qinzuoyan commented 8 years ago

@imzhenyu , do you have better improvements?

imzhenyu commented 8 years ago

There is a configuration for the mutation cache size. You may enlarge the cache size to avoid learning the private log as well in many cases.

imzhenyu commented 8 years ago

Another way is to use RPC + AIO to pipeline log read + private log apply to avoid unnecessary log written on potential secondary's disk.