Replication is not correct (online/offline), master-master

vukitoso commented 4 years ago

Hi. tarantool 2.4.3.0.g5180d98f1-1 tarantool-memcached from git https://github.com/tarantool/memcached

В низу сообщения текст на русском.

Replication is not correct with memcached. I have two virtual machines: s1, s2. master-master

1) STATE: s1 - online s2 - online

ACTION: s2 -> offline

2) STATE: s1 - online s2 - offline

ACTION: s1:

telnet 0 11211
set q1 0 0 2
11
set q2 0 0 2
22

s1 -> offline s2 -> online

3) STATE: s1 - offline s2 - online

ACTION: s2:

telnet 0 11211
set q1 0 0 5
12345

s1 -> online

4) STATE: s1 - online s2 - online

ACTION: s1:

telnet 0 11211
get q1 q2
VALUE q1 0 5
12345             <- GOOD
VALUE q2 0 2
22

s2:

telnet 0 11211
get q1 q2
VALUE q1 0 2
11                <- BAD
VALUE q2 0 2
22

'q1' on 's2' FAIL? On the 's2' server, the q1 variable should be 12345, because this is the most recent value entered and all replicas and masters should be 12345.

master.lua on s1:

-- файл экземпляра для мастера

box.cfg {
  listen = 3301;
--  replication = {'replicator:password@192.168.56.101:3301',  -- URI мастера
--                 'replicator:password@192.168.56.102:3301'}; -- URI реплики
  replication = {'replicator:password@192.168.56.102:3301'}; -- URI мастера
  read_only = false;

  instance_uuid='111fec43-18a9-4e12-a684-a42b716fc001';
  replicaset_uuid='00003d13-508b-4b8e-82e6-806f088e0000';

  replication_timeout = 1; -- По умолчанию: 1

  replication_connect_timeout = 5; -- по умолчанию 30

  replication_sync_timeout = 10; -- По умолчанию: 300

  replication_sync_lag = nil; -- По умолчанию: 10

  replication_connect_quorum = 0;
}

box.once("schema", function()
  box.schema.user.create('replicator', {password = 'password'})
  box.schema.user.grant('replicator', 'replication') -- настроить роль для репликации
  box.schema.space.create("test")
  box.space.test:create_index("primary")
  print('box.once executed on master #1')
end)

local memcached = require('memcached')
local instance = memcached.create('my_instance', '0.0.0.0:11211')

master.lua on s2:

-- файл экземпляра для мастера

box.cfg {
  listen = 3301;
--  replication = {'replicator:password@192.168.56.101:3301',  -- URI мастера
--                 'replicator:password@192.168.56.102:3301'}; -- URI реплики
  replication = {'replicator:password@192.168.56.101:3301'}; -- URI мастера
  read_only = false;

  instance_uuid='222fec43-18a9-4e12-a684-a42b716fc002';
  replicaset_uuid='00003d13-508b-4b8e-82e6-806f088e0000';

  replication_timeout = 1; -- По умолчанию: 1

  replication_connect_timeout = 5; -- по умолчанию 30

  replication_sync_timeout = 10; -- По умолчанию: 300

  replication_sync_lag = nil; -- По умолчанию: 10

  replication_connect_quorum = 0;
}

box.once("schema", function()
  box.schema.user.create('replicator', {password = 'password'})
  box.schema.user.grant('replicator', 'replication') -- настроить роль для репликации
  box.schema.space.create("test")
  box.space.test:create_index("primary")
  print('box.once executed on master #2')
end)

local memcached = require('memcached')
local instance = memcached.create('my_instance', '0.0.0.0:11211')

НА РУССКОМ:

Здравствуйте. Заметил, что не правильно реплицируются данные, добавленные в memcached при работе тарантула как master-master. По логике самым приоритетным значением, которое должно иметь вес должно быть то, которое добавлено самое последнее по времени. Пример: имеем два сервера s1 и s2. Оба сервера ОНЛАЙН. Отключаем интернет на s2. На s1 добавляем две переменные: q1 = 11 q2 = 22 Отключаем интернет на s1. Включаем интернет на s2. На s2 добавляем переменную q1 = 12345 Включаем интернет на s1. Сервера реплицируются. Считываем переменные с обоих серверов На s1: q1 = 12345 (правильно) q2 = 22 На s2: q1 = 11 (НЕ правильно) q2 = 22

Т.к. последним была добавлена переменная q1=12345 на s2, то это значение и должно реплицироваться на все сервера. s1 сервер считал q1 значение с s2, но s2 счёл за приоритет значение из s1 сервера, а оно уже УСТАРЕВШЕЕ! Что не правильно по логике. Я прав и логика репликации не верная или всё таки я не правильно рассуждаю?

Если можно, то лучше вести переписку на русском. Спасибо.

Totktonada commented 4 years ago

NB: @vebmaster noted in a chat that repcached offers the behaviour he desires. The problem looks quite common for a bidirectional replication with a single writeable instance (master). I would propose the following procedure.

Switching of masters requires waiting of all operations from a past master. If the waiting timed out, then the old master must not join the replicaset anymore and should be rebootstrapped (all snaps and xlogs deleted).

In fact, it is the manual failover. What about automatic failover?

Tarantool 2.5.1 offers quorum-based synchronous replication.
Tarantool 2.6.1 offers Raft based leader election algorithm.
The cartridge framework has a stateful failover implementation.

I'm not much in the necessary context, but it seems qsync + raft should work as the automatic failover. I guess asynchronous replication + raft will not (what kind of manual actions are required so?). Cartridge should work, but I don't know whether it is easy or hard to run the memcached module under it.

I would summon replication sages: @sergos, @sergepetrenko, @Gerold103 and @Mons to correct me or propose something else.

// We can discuss it in Russian in the Russian Telegram chat, but I would prefer English for discussions around the code and issues to make them available for everyone.

vukitoso commented 4 years ago

@Totktonada thx. Yes, I communicate in the telegram channel https://t.me/tarantoolru I was advised this solution: https://www.tarantool.io/en/doc/2.4/book/replication/repl_problem_solving/ https://github.com/tarantool/expirationd

Totktonada commented 4 years ago

The memcached module has its own expiration: no need to use expirationd for records eviction.

tarantool / memcached

Replication is not correct (online/offline), master-master #64