Closed guessmyname closed 4 years ago
Howdy @guessmyname , thank you so much for bringing this to our attention. I have a few questions to solidify my understanding of the issue.
Is the problem/bug you're focused on mostly related to the fact when you move a client to another datacenter with the same datacenter name it shares information about the old cluster, or rather that you cannot perform a snapshot after recovering?
If it is focused on the snapshot aspect - is the peers.json
you are restoring the client with from the original datacenter or the new datacenter?
Any further replication steps, logs, or links to Github repositories are appreciated. :)
Hey there, This issue has been automatically closed because there hasn't been any activity for at least 90 days. If you are still experiencing problems, or still have questions, feel free to open a new one :+1:
Hey there,
This issue has been automatically locked because it is closed and there hasn't been any activity for at least 30 days.
If you are still experiencing problems, or still have questions, feel free to open a new one :+1:.
Overview of the Issue
We inadvertently joined a new clusters to an existing cluster that wound up causing the existing cluster to fail. We were able to recover using peers.json file. After recovery we are no longer able to perform snapshots of the existing cluster. We get the following error:
snapshot: Failed to get meta data to open snapshot: open /etc/consul.d/raft/snapshots/6-69538013-1571760497270/meta.json: no such file or directory
Reproduction Steps
Change client that is using an existing cluster to point to a new cluster with same datacenter name. This will cause new cluster to become aware of old cluster and attempt to connect. Afterwards use peers.json to recover old cluster. Once recovered try to do snapshot using command
consul snapshot save backup.snap
Consul info for both Client and Server
Client info
``` agent: check_monitors = 0 check_ttls = 0 checks = 0 services = 0 build: prerelease = revision = 9a494b5f version = 1.0.6 consul: bootstrap = false known_datacenters = 1 leader = false leader_addr = 10.19.0.168:8300 server = true raft: applied_index = 69559359 commit_index = 69559359 fsm_pending = 0 last_contact = 58.379511ms last_log_index = 69559359 last_log_term = 6 last_snapshot_index = 69554429 last_snapshot_term = 6 latest_configuration = [{Suffrage:Voter ID:a6ad78c2-79d4-4472-242a-fe01382ca52c Address:10.19.88.163:8300} {Suffrage:Voter ID:05b8c3a7-5fa3-16f8-688e-986cd1e36266 Address:10.19.41.179:8300} {Suffrage:Voter ID:558d6953-3104-1122-35d5-021526a2cea1 Address:10.19.0.168:8300} {Suffrage:Voter ID:05d20dfe-8454-6513-de2c-1279bcfc6f7b Address:10.19.42.133:8300} {Suffrage:Voter ID:2ebe1869-8c13-9875-a771-de3b02de7c90 Address:10.19.66.1:8300}] latest_configuration_index = 68927085 num_peers = 4 protocol_version = 3 protocol_version_max = 3 protocol_version_min = 0 snapshot_version_max = 1 snapshot_version_min = 0 state = Follower term = 6 runtime: arch = amd64 cpu_count = 2 goroutines = 946 max_procs = 2 os = linux version = go1.9.3 serf_lan: coordinate_resets = 0 encrypted = true event_queue = 0 event_time = 293 failed = 258 health_score = 3 intent_queue = 0 left = 96 member_time = 2263245 members = 565 query_queue = 0 query_time = 1 serf_wan: coordinate_resets = 0 encrypted = true event_queue = 0 event_time = 1 failed = 0 health_score = 0 intent_queue = 0 left = 0 member_time = 787 members = 5 query_queue = 0 query_time = 1 ```Server info
``` agent: check_monitors = 0 check_ttls = 0 checks = 0 services = 0 build: prerelease = revision = 9a494b5f version = 1.0.6 consul: bootstrap = false known_datacenters = 1 leader = false leader_addr = 10.19.0.168:8300 server = true raft: applied_index = 69559142 commit_index = 69559142 fsm_pending = 0 last_contact = 49.017142ms last_log_index = 69559143 last_log_term = 6 last_snapshot_index = 69554429 last_snapshot_term = 6 latest_configuration = [{Suffrage:Voter ID:a6ad78c2-79d4-4472-242a-fe01382ca52c Address:10.19.88.163:8300} {Suffrage:Voter ID:05b8c3a7-5fa3-16f8-688e-986cd1e36266 Address:10.19.41.179:8300} {Suffrage:Voter ID:558d6953-3104-1122-35d5-021526a2cea1 Address:10.19.0.168:8300} {Suffrage:Voter ID:05d20dfe-8454-6513-de2c-1279bcfc6f7b Address:10.19.42.133:8300} {Suffrage:Voter ID:2ebe1869-8c13-9875-a771-de3b02de7c90 Address:10.19.66.1:8300}] latest_configuration_index = 68927085 num_peers = 4 protocol_version = 3 protocol_version_max = 3 protocol_version_min = 0 snapshot_version_max = 1 snapshot_version_min = 0 state = Follower term = 6 runtime: arch = amd64 cpu_count = 2 goroutines = 908 max_procs = 2 os = linux version = go1.9.3 serf_lan: coordinate_resets = 0 encrypted = true event_queue = 0 event_time = 293 failed = 266 health_score = 0 intent_queue = 0 left = 115 member_time = 2263229 members = 565 query_queue = 0 query_time = 1 serf_wan: coordinate_resets = 0 encrypted = true event_queue = 0 event_time = 1 failed = 0 health_score = 0 intent_queue = 0 left = 0 member_time = 787 members = 5 query_queue = 0 query_time = 1 ```Operating system and Environment details
5 Node Cluster Red Hat Enterprise Linux Server release 7.2 (Maipo)
Log Fragments
2019/10/22 12:08:17 [INFO] consul.fsm: snapshot created in 24.758µs 2019/10/22 12:08:17 [INFO] raft: Starting snapshot up to 69538013 2019/10/22 12:08:17 [INFO] snapshot: Creating new snapshot at /etc/consul.d/raft/snapshots/6-69538013-1571760497270.tmp 2019/10/22 12:08:17 [INFO] snapshot: reaping snapshot /etc/consul.d/raft/snapshots/6-69538013-1571760497270 2019/10/22 12:08:17 [INFO] raft: Compacting logs from 69526779 to 69527774 2019/10/22 12:08:17 [INFO] raft: Snapshot to 69538013 complete 2019/10/22 12:08:17 [ERR] snapshot: Failed to get meta data to open snapshot: open /etc/consul.d/raft/snapshots/6-69538013-1571760497270/meta.json: no such file or directory 2019/10/22 12:08:17 [ERR] http: Request GET /v1/snapshot?stale=, error: failed to open snapshot: open /etc/consul.d/raft/snapshots/6-69538013-1571760497270/meta.json: no such file or directory: from=10.19.21.14:56174