vitalif / vitastor

Simplified distributed block and file storage with strong consistency, like in Ceph (repository mirror)
https://vitastor.io
Other
128 stars 22 forks source link

[vitastor-mon] After etcd master switch, the etcd data of vitastor is not synchronized #16

Closed lnsyyj closed 2 years ago

lnsyyj commented 3 years ago

Hi @vitalif ,

I have 3 machines, deployed etcd cluster, and started vitastor-mon on each machine. Initial environment:

debian-1 192.168.122.201
debian-2 192.168.122.202
debian-3 192.168.122.203

root@debian-2:/usr/lib/vitastor/mon# ps -ef | grep etcd
etcd      120717       1  1 22:42 ?        00:00:55 /usr/local/bin/etcd -name etcd1 --data-dir /var/lib/etcd1.etcd --advertise-client-urls http://192.168.122.202:2379 --listen-client-urls http://192.168.122.202:2379 --initial-advertise-peer-urls http://192.168.122.202:2380 --listen-peer-urls http://192.168.122.202:2380 --initial-cluster-token vitastor-etcd-1 --initial-cluster etcd0=http://192.168.122.201:2380,etcd1=http://192.168.122.202:2380,etcd2=http://192.168.122.203:2380 --initial-cluster-state new --max-txn-ops=100000 --max-request-bytes=104857600 --auto-compaction-retention=10 --auto-compaction-mode=revision
vitastor  129012       1  0 23:26 ?        00:00:02 node /usr/lib/vitastor/mon/mon-main.js --etcd_url http://192.168.122.201:2379,http://192.168.122.202:2379,http://192.168.122.203:2379 --etcd_prefix /vitastor --etcd_start_timeout 5

root@debian-1:~# etcdctl --endpoints 192.168.122.202:2379 get "" --prefix
/vitastor/config/pgs
{"hash":"6ea319e831e1085b45bc25e164b4ab4d6d63095c"}
/vitastor/mon/master
{"ip":["192.168.122.201"]}

When I stopped the master etcd service, the data in the etcd database did not seem to change. I found that the data of /vitastor/mon/master in etcd still pointed to the ip of the stopped machine.

root@debian-1:~# systemctl status etcd
● etcd.service - etcd for vitastor
     Loaded: loaded (/etc/systemd/system/etcd.service; enabled; vendor preset: enabled)
     Active: inactive (dead)

Jul 27 23:25:31 debian-1 etcd[124808]: peer 263f76c692e97c7c became inactive (message send to peer failed)
Jul 27 23:25:31 debian-1 etcd[124808]: stopped streaming with peer 263f76c692e97c7c (stream Message reader)
Jul 27 23:25:31 debian-1 etcd[124808]: stopped peer 263f76c692e97c7c
Jul 27 23:25:31 debian-1 etcd[124808]: failed to find member dd427e761e03dc4 in cluster d42fce0aa68ba65
Jul 27 23:25:31 debian-1 etcd[124808]: failed to find member dd427e761e03dc4 in cluster d42fce0aa68ba65
Jul 27 23:25:31 debian-1 etcd[124808]: failed to find member 263f76c692e97c7c in cluster d42fce0aa68ba65
Jul 27 23:25:31 debian-1 etcd[124808]: failed to find member 263f76c692e97c7c in cluster d42fce0aa68ba65
Jul 27 23:25:31 debian-1 systemd[1]: etcd.service: Succeeded.
Jul 27 23:25:31 debian-1 systemd[1]: Stopped etcd for vitastor.
Jul 27 23:25:31 debian-1 systemd[1]: etcd.service: Consumed 47.950s CPU time.
root@debian-1:~# systemctl status vitastor-mon.service 
● vitastor-mon.service - Vitastor monitor
     Loaded: loaded (/etc/systemd/system/vitastor-mon.service; disabled; vendor preset: enabled)
     Active: active (running) since Tue 2021-07-27 22:44:04 EDT; 44min ago
   Main PID: 124903 (node)
      Tasks: 7
     Memory: 40.8M
        CPU: 7.851s
     CGroup: /system.slice/vitastor-mon.service
             └─124903 node /usr/lib/vitastor/mon/mon-main.js --etcd_url http://192.168.122.201:2379,http://192.168.122.202:2379,http://192.168.122.203:2379 --etcd_prefix /vitastor --etcd_start_timeout 5

Jul 27 22:44:04 debian-1 systemd[1]: Started Vitastor monitor.
Jul 27 22:44:05 debian-1 node[124903]: Became master
Jul 27 22:44:05 debian-1 node[124903]: PG configuration successfully changed
root@debian-1:~# etcdctl --endpoints 192.168.122.201:2379 get "" --prefix
^C
root@debian-1:~# etcdctl --endpoints 192.168.122.202:2379 get "" --prefix
/vitastor/config/pgs
{"hash":"6ea319e831e1085b45bc25e164b4ab4d6d63095c"}
/vitastor/mon/master
{"ip":["192.168.122.201"]}
vitalif commented 3 years ago

It should change after etcd lease times out. Lease is acquired for etcd_mon_ttl + etcd_mon_timeout*etcd_mon_retries seconds.

vitalif commented 2 years ago

I'll close this by now. Reopen if you have more questions