hashicorp / consul

Consul is a distributed, highly available, and data center aware solution to connect and configure applications across dynamic, distributed infrastructure.
https://www.consul.io
Other
28.39k stars 4.43k forks source link

snapshot restore only recover index one of auth-method in same type #9784

Closed kkitai closed 3 years ago

kkitai commented 3 years ago

Overview of the Issue

My consul has two auth-method. it has the same type as jwt.

$ consul acl auth-method list
method-1:
   Type:         jwt
   Description:  
method-2:
   Type:         jwt
   Description:  

Then I get the snapshot by snaptshot save and recover data from snapshot restore.

$ consul snapshot save backup
Saved and verified snapshot to index 1815667
$ consul snapshot restore backup
Restored snapshot

It seems that only recover index one of them.

$ consul acl auth-method list
corporate-idp:
   Type:         jwt
   Description:  

Reproduction Steps

above commands

Consul info for both Client and Server

Client info ``` $ consul info agent: check_monitors = 0 check_ttls = 0 checks = 0 services = 0 build: prerelease = revision = 12b16df3 version = 1.8.4 consul: acl = enabled bootstrap = false known_datacenters = 1 leader = true leader_addr = 203.216.239.18:8300 server = true raft: applied_index = 1815718 commit_index = 1815718 fsm_pending = 0 last_contact = 0 last_log_index = 1815718 last_log_term = 202 last_snapshot_index = 1815669 last_snapshot_term = 202 latest_configuration = [{Suffrage:Voter ID:b05a33e9-77d0-414c-4c87-29af0465d07d Address:203.216.239.29:8300} {Suffrage:Voter ID:eca87dbb-ea5d-656d-7bf2-b5ba157edfb1 Address:203.216.239.19:8300} {Suffrage:Voter ID:04cc26f7-4ef7-2064-9172-2a8e08b9b361 Address:203.21 6.239.18:8300} {Suffrage:Voter ID:2d24eb72-4315-3d03-0c66-05bbab3d3d81 Address:203.216.239.22:8300}] latest_configuration_index = 0 num_peers = 3 protocol_version = 3 protocol_version_max = 3 protocol_version_min = 0 snapshot_version_max = 1 snapshot_version_min = 0 state = Leader term = 202 runtime: arch = amd64 cpu_count = 2 goroutines = 134 max_procs = 2 os = linux version = go1.14.6 serf_lan: coordinate_resets = 0 encrypted = true event_queue = 0 event_time = 96 failed = 0 health_score = 0 intent_queue = 0 left = 0 member_time = 375 members = 4 query_queue = 0 query_time = 1 serf_wan: coordinate_resets = 0 encrypted = true event_queue = 0 event_time = 1 failed = 0 health_score = 0 intent_queue = 0 left = 0 member_time = 154 members = 4 query_queue = 0 query_time = 1 ```
Server info ``` # consul info agent: check_monitors = 0 check_ttls = 0 checks = 0 services = 0 build: prerelease = revision = 12b16df3 version = 1.8.4 consul: acl = enabled bootstrap = false known_datacenters = 1 leader = true leader_addr = 203.216.239.18:8300 server = true raft: applied_index = 1815744 commit_index = 1815744 fsm_pending = 0 last_contact = 0 last_log_index = 1815744 last_log_term = 202 last_snapshot_index = 1815669 last_snapshot_term = 202 latest_configuration = [{Suffrage:Voter ID:b05a33e9-77d0-414c-4c87-29af0465d07d Address:203.216.239.29:8300} {Suffrage:Voter ID:eca87dbb-ea5d-656d-7bf2-b5ba157edfb1 Address:203.216.239.19:8300} {Suffrage:Voter ID:04cc26f7-4ef7-2064-9172-2a8e08b9b361 Address:203.21 6.239.18:8300} {Suffrage:Voter ID:2d24eb72-4315-3d03-0c66-05bbab3d3d81 Address:203.216.239.22:8300}] latest_configuration_index = 0 num_peers = 3 protocol_version = 3 protocol_version_max = 3 protocol_version_min = 0 snapshot_version_max = 1 snapshot_version_min = 0 state = Leader term = 202 runtime: arch = amd64 cpu_count = 2 goroutines = 133 max_procs = 2 os = linux version = go1.14.6 serf_lan: coordinate_resets = 0 encrypted = true event_queue = 0 event_time = 96 failed = 0 health_score = 0 intent_queue = 0 left = 0 member_time = 375 members = 4 query_queue = 0 query_time = 1 serf_wan: coordinate_resets = 0 encrypted = true event_queue = 0 event_time = 1 failed = 0 health_score = 0 intent_queue = 0 left = 0 member_time = 154 members = 4 query_queue = 0 query_time = 1 ```

Operating system and Environment details

OS, Architecture, and any other information you can provide about the environment.

Log Fragments

Include appropriate Client or Server log fragments. If the log is longer than a few dozen lines, please include the URL to the gist of the log instead of posting it in the issue. Use -log-level=TRACE on the client and server to capture the maximum log detail.

dnephin commented 3 years ago

Thank you for the bug report! I took a quick look and didn't see anything obvious wrong in the snapshot or restore for this type.

What happens if you run consul snapshot inspect (https://www.consul.io/commands/snapshot/inspect) on the snapshot? Is the count for auth methods 1 or 2? I'm hoping this information will tell us if the problem is with the snapshot or the restore.

kkitai commented 3 years ago

@dnephin

thank you for reply. I tried to do consul snapshot inspect and the result following:

$ consul snapshot inspect backup.snap.20210325-134843 
ID           41-3692810-1616647723172
Size         4112073
Index        3692810
Term         41
Version      1

I can't see anything about the informations of auth methods. Is it usually printed out?

dnephin commented 3 years ago

Oh, we added the more detailed output in 1.9.0 I think. If you use the CLI for Consul 1.9.x it should give you that extra detail.

kkitai commented 3 years ago

Thank you. I updated consul. I think it says that only have one auth method.

$ consul snapshot inspect backup.snap.20210325-134843
ID           41-3692810-1616647723172
Size         4112073
Index        3692810
Term         41
Version      1

 Type                       Count      Size        
 ----                       ----       ----        
 KVS                        160        3.9MB       
 Register                   24         15.3KB      
 ACLBindingRule             50         10.8KB      
 ACLToken                   20         9.2KB       
 ACLPolicy                  28         8.7KB       
 ACLRole                    25         7.1KB       
 CoordinateBatchUpdate      8          1.5KB       
 Index                      16         469B        
 ACLAuthMethod              1          266B        
 Autopilot                  1          199B        
 FederationState            1          152B        
 ChunkingState              1          12B         
 ----                       ----       ----        
 Total                                 3.9MB

But, I can see:

$ consul acl auth-method list -http-addr http://localhost:80 -token xxxxx-xxxxx-xxxxx
method-1:
   Type:         jwt
   Description:  
method-2:
   Type:         jwt
   Description:  
dnephin commented 3 years ago

Ya, the snapshot appears to only contain one ACLAuthMethod. I can't see anything in the code for saving the snapshot that could cause this problem. Is there any chance that maybe the newer snapshot is saved somewhere else and this snapshot is an older one?

kkitai commented 3 years ago

OK.

I tried to add one auth-method:

$ consul acl auth-method list 
method-1:
   Type:         jwt
   Description:  
method-2:
   Type:         jwt
   Description:  
method-3:
   Type:         jwt
   Description:  

And then:

$ consul snapshot inspect backup                                                   
ID           303-290374-1616983765520                                                                 
Size         56156                                                                                    
Index        290374                                                                                   
Term         303                                                                                      
Version      1      

 Type                       Count      Size                                                                                                                                                          [1/614]
 ----                       ----       ----                                                                                                                                                                 
 KVS                        1          26.9KB                                                         
 Register                   24         15.2KB                                                         
 ACLPolicy                  10         3.3KB                                                          
 ACLBindingRule             14         3.2KB                                                          
 ACLRole                    7          2KB                                                            
 CoordinateBatchUpdate      8          1.5KB                                                          
 ACLToken                   4          1.5KB                                                          
 Index                      16         459B         
 ACLAuthMethod              1          261B         
 Tombstone                  2          232B         
 Autopilot                  1          199B         
 FederationState            1          139B         
 ChunkingState              1          12B          
 ----                       ----       ----         
 Total                                 54.8KB

I can't understand what happened though, If I upgrade consul server(1.8 to 1.9), something may change?

dnephin commented 3 years ago

Thank you again for reporting this bug! #10025 has a fix which should be in the next releases.