potential secondary learn too slow to catch up with primary

qinzuoyan commented 8 years ago

In our test, we found that the potential secondary learn too slow:

log.201.txt:D12:14:41.839 (1470744881839169361 bc6c) replica5.replica_long3.080400030000001d: replica.learn:843:on_copy_remote_state_completed(): 1.4@10.108.187.19:34805: on_copy_remote_state_completed[0000025200000002]: learnee = 10.108.187.19:34803, learn_duration = 3754 ms, apply checkpoint/log done, err = ERR_OK, app_committed_decree = (894646 => 897430), app_durable_decree = (869234 => 869234), local_committed_decree = 894646, remote_committed_decree = 897431, prepare_start_decree = -1, current_learning_status = replication::learner_status::LearningWithoutPrepare
log.201.txt:D12:14:44.532 (1470744884532517850 bc6a) replica5.replica_long1.0804000300000372: replica.learn:843:on_copy_remote_state_completed(): 1.4@10.108.187.19:34805: on_copy_remote_state_completed[0000025200000002]: learnee = 10.108.187.19:34803, learn_duration = 6448 ms, apply checkpoint/log done, err = ERR_OK, app_committed_decree = (897430 => 897690), app_durable_decree = (869234 => 869234), local_committed_decree = 894646, remote_committed_decree = 897691, prepare_start_decree = -1, current_learning_status = replication::learner_status::LearningWithoutPrepare
log.201.txt:D12:14:47.068 (1470744887068260243 bc69) replica5.replica_long0.080400030000057d: replica.learn:843:on_copy_remote_state_completed(): 1.4@10.108.187.19:34805: on_copy_remote_state_completed[0000025200000002]: learnee = 10.108.187.19:34803, learn_duration = 8983 ms, apply checkpoint/log done, err = ERR_OK, app_committed_decree = (897690 => 897820), app_durable_decree = (869234 => 869234), local_committed_decree = 894646, remote_committed_decree = 897821, prepare_start_decree = -1, current_learning_status = replication::learner_status::LearningWithoutPrepare
...
log.211.txt:D12:38:55.960 (1470746335960825825 bc69) replica5.replica_long0.08040003000571ba: replica.learn:843:on_copy_remote_state_completed(): 1.4@10.108.187.19:34805: on_copy_remote_state_completed[0000025200000002]: learnee = 10.108.187.19:34803, learn_duration = 1457876 ms, apply checkpoint/log done, err = ERR_OK, app_committed_decree = (991185 => 991475), app_durable_decree = (869234 => 869234), local_committed_decree = 894646, remote_committed_decree = 991476, prepare_start_decree = -1, current_learning_status = replication::learner_status::LearningWithoutPrepare
log.211.txt:D12:39:00.286 (1470746340286794584 bc69) replica5.replica_long0.080400030005764f: replica.learn:843:on_copy_remote_state_completed(): 1.4@10.108.187.19:34805: on_copy_remote_state_completed[0000025200000002]: learnee = 10.108.187.19:34803, learn_duration = 1462202 ms, apply checkpoint/log done, err = ERR_OK, app_committed_decree = (991475 => 991780), app_durable_decree = (869234 => 869234), local_committed_decree = 894646, remote_committed_decree = 991781, prepare_start_decree = -1, current_learning_status = replication::learner_status::LearningWithoutPrepare
log.211.txt:D12:39:00.377 (1470746340377718938 bc69) replica5.replica_long0.0804000300057af4: replica.learn:843:on_copy_remote_state_completed(): 1.4@10.108.187.19:34805: on_copy_remote_state_completed[0000025200000002]: learnee = 10.108.187.19:34803, learn_duration = 1462293 ms, apply checkpoint/log done, err = ERR_OK, app_committed_decree = (991780 => 992088), app_durable_decree = (869234 => 869234), local_committed_decree = 894646, remote_committed_decree = 992089, prepare_start_decree = -1, current_learning_status = replication::learner_status::LearningWithoutPrepare

As you can see, the learning process lasts for tens of minutes.

That is:

on each learning round, PotentialSecondary copies a private log from Primary and applies it, which learns hundreds of mutations and takes 2~3 seconds; but during the the time, Primary commits another hundreds of mutations, and the private log grows.
then, PotentialSecondary need to start another round of learning

The reason is:

on each round, PotentialSecondary need to copy the whole latest private log file, only to learn the last several mutations
the private log will be switched (create new private log) only if its size exceed a limit (which is 32MB in our config)
so we copy lots of unnecessary data on each learning round

Improvement suggestion:

if Primary detects that the private log is in learning, it can switch the private log immediately to reduce copying size of private log.

qinzuoyan commented 8 years ago

@imzhenyu , do you have better improvements?

imzhenyu commented 8 years ago

There is a configuration for the mutation cache size. You may enlarge the cache size to avoid learning the private log as well in many cases.

imzhenyu commented 8 years ago

Another way is to use RPC + AIO to pipeline log read + private log apply to avoid unnecessary log written on potential secondary's disk.

imzhenyu / rDSN

potential secondary learn too slow to catch up with primary #528