Open kemko opened 1 year ago
Given the large gap between 1.9.x and 1.14.x, I am wondering if upgrading to an earlier version that helps narrow down reasoning about the root cause, like 1.9.x to 1.10.x ....
https://developer.hashicorp.com/consul/docs/upgrading/instructions
Got same error, replication not working between 1.12.8 and 1.13.7 If the primary and secondary use version 1.13.7 everything works, but if one of them 1.12.8 policy replycation get error
Policy replication does not work with any version 1.13.x if one of the DC is below version 1.13.x
Is there any solution to this problem? We use 8 datacenters and have always performed the update according to the following instructions: "Upgrade the Consul agents in all DCs to version 1.x.x by following our General Upgrade Process. This should be done one DC at a time, leaving the primary DC for last"
But this scheme does not work when upgrading from 1.12.8 to 1.3.7. The field of updating of the first DC at it replication ACL flies. Raised a test environment with 3 data centers and revealed the following - 1.12.8 (9) in principle, ACL synchronization with versions 1.3.x does not work If you update the primary DC to version 1.3.x, then replication crashes on all other DCs of version 1.12.8, and if only on one of the secondary ones, then it crashes on it.
The only option I see is to update the consul in all DCs at the same time, but this will affect more critical services, which I would not like. Is this update option intended or is it a bug? The Specific Version Details does not contain information about the change in the replication system in versions 1.3.x
Got same case. upgrade from 1.11.4 -> 12.x.x -> 1.13.x -> 1.14.x -> 1.15.x -> 1.16.x->1.17.x->1.19.x, in the middle of upgraded version replication stopped working, all keys have been deleted in a secondary DC
Overview of the Issue
After upgrading Consul from 1.9.5 to 1.14.3, ACL replication breaks. It's fixed by some rather strange actions. We decided to file a bug report since we could not find any notes about this behavior in the documentation.
Reproduction Steps
After that, the replication error will disappear for all datacenters and replication will work as expected.
Operating system and Environment details
Ubuntu 20.04.5 LTS, x86_64 GNU/Linux
Log Fragments
Log 1
```log 2023-02-15T12:40:01.391+0300 [DEBUG] agent.server.replication.acl.policy: finished fetching acls: amount=27 2023-02-15T12:40:01.391+0300 [DEBUG] agent.server.replication.acl.policy: acl replication: local=27 remote=27 2023-02-15T12:40:01.391+0300 [DEBUG] agent.server.replication.acl.policy: acl replication: deletions=0 updates=1 2023-02-15T12:40:01.391+0300 [DEBUG] agent.server.replication.acl.policy: acl replication - downloaded updates: amount=1 2023-02-15T12:40:01.391+0300 [DEBUG] agent.server.replication.acl.policy: acl replication - performing updates 2023-02-15T12:40:01.391+0300 [WARN] agent.server.replication.acl.policy: ACL replication error (will retry if still leader): error="failed to update local ACL policies: Failed to apply policy upserts: node is not the leader" ```Log 2
```log 2023-02-15T13:10:29.486+0300 [DEBUG] agent.server.replication.acl.policy: finished fetching acls: amount=27 2023-02-15T13:10:29.486+0300 [DEBUG] agent.server.replication.acl.policy: acl replication: local=27 remote=27 2023-02-15T13:10:29.487+0300 [DEBUG] agent.server.replication.acl.policy: acl replication: deletions=0 updates=0 2023-02-15T13:10:29.487+0300 [DEBUG] agent.server.replication.acl.policy: ACL replication completed through remote index: index=1867938962 2023-02-15T13:12:44.368+0300 [INFO] agent.server.replication.acl.policy: started ACL Policy replication 2023-02-15T13:12:44.373+0300 [DEBUG] agent.server.replication.acl.policy: finished fetching acls: amount=27 2023-02-15T13:12:44.373+0300 [DEBUG] agent.server.replication.acl.policy: acl replication: local=27 remote=27 2023-02-15T13:12:44.373+0300 [DEBUG] agent.server.replication.acl.policy: acl replication: deletions=0 updates=0 2023-02-15T13:12:44.373+0300 [DEBUG] agent.server.replication.acl.policy: ACL replication completed through remote index: index=1867938962 2023-02-15T13:15:40.920+0300 [DEBUG] agent.server.replication.acl.policy: finished fetching acls: amount=27 2023-02-15T13:17:53.696+0300 [DEBUG] agent.server.replication.acl.policy: finished fetching acls: amount=27 2023-02-15T13:17:53.696+0300 [DEBUG] agent.server.replication.acl.policy: acl replication: local=27 remote=27 2023-02-15T13:17:53.696+0300 [DEBUG] agent.server.replication.acl.policy: acl replication: deletions=0 updates=0 2023-02-15T13:17:53.696+0300 [DEBUG] agent.server.replication.acl.policy: ACL replication completed through remote index: index=1867938962 2023-02-15T13:23:00.541+0300 [DEBUG] agent.server.replication.acl.policy: finished fetching acls: amount=27 2023-02-15T13:23:00.541+0300 [DEBUG] agent.server.replication.acl.policy: acl replication: local=27 remote=27 2023-02-15T13:23:00.541+0300 [DEBUG] agent.server.replication.acl.policy: acl replication: deletions=0 updates=0 2023-02-15T13:23:00.541+0300 [DEBUG] agent.server.replication.acl.policy: ACL replication completed through remote index: index=1867938962 2023-02-15T13:28:01.043+0300 [DEBUG] agent.server.replication.acl.policy: finished fetching acls: amount=27 2023-02-15T13:28:01.043+0300 [DEBUG] agent.server.replication.acl.policy: acl replication: local=27 remote=27 2023-02-15T13:28:01.043+0300 [DEBUG] agent.server.replication.acl.policy: acl replication: deletions=0 updates=0 2023-02-15T13:28:01.043+0300 [DEBUG] agent.server.replication.acl.policy: ACL replication completed through remote index: index=1867938962 2023-02-15T13:33:11.701+0300 [DEBUG] agent.server.replication.acl.policy: finished fetching acls: amount=27 2023-02-15T13:33:11.701+0300 [WARN] agent.server.replication.acl.policy: ACL replication remote index moved backwards, forcing a full ACL sync: from=1867938962 to=1692767365 2023-02-15T13:33:11.701+0300 [DEBUG] agent.server.replication.acl.policy: acl replication: local=27 remote=27 2023-02-15T13:33:11.701+0300 [DEBUG] agent.server.replication.acl.policy: acl replication: deletions=0 updates=1 2023-02-15T13:33:11.705+0300 [DEBUG] agent.server.replication.acl.policy: acl replication - downloaded updates: amount=1 2023-02-15T13:33:11.706+0300 [DEBUG] agent.server.replication.acl.policy: acl replication - performing updates 2023-02-15T13:33:11.713+0300 [DEBUG] agent.server.replication.acl.policy: acl replication - upserted batch: number_upserted=1 batch_size=497 2023-02-15T13:33:11.713+0300 [DEBUG] agent.server.replication.acl.policy: acl replication - finished updates 2023-02-15T13:33:11.713+0300 [DEBUG] agent.server.replication.acl.policy: ACL replication completed through remote index: index=1692767365 2023-02-15T13:33:11.718+0300 [DEBUG] agent.server.replication.acl.policy: finished fetching acls: amount=27 2023-02-15T13:33:11.718+0300 [DEBUG] agent.server.replication.acl.policy: acl replication: local=27 remote=27 2023-02-15T13:33:11.718+0300 [DEBUG] agent.server.replication.acl.policy: acl replication: deletions=0 updates=0 2023-02-15T13:33:11.718+0300 [DEBUG] agent.server.replication.acl.policy: ACL replication completed through remote index: index=1867938962 2023-02-15T13:38:26.839+0300 [DEBUG] agent.server.replication.acl.policy: finished fetching acls: amount=27 2023-02-15T13:38:26.839+0300 [DEBUG] agent.server.replication.acl.policy: acl replication: local=27 remote=27 2023-02-15T13:38:26.839+0300 [DEBUG] agent.server.replication.acl.policy: acl replication: deletions=0 updates=0 2023-02-15T13:38:26.839+0300 [DEBUG] agent.server.replication.acl.policy: ACL replication completed through remote index: index=1867938962 2023-02-15T13:43:32.062+0300 [DEBUG] agent.server.replication.acl.policy: finished fetching acls: amount=27 2023-02-15T13:43:32.062+0300 [WARN] agent.server.replication.acl.policy: ACL replication remote index moved backwards, forcing a full ACL sync: from=1867938962 to=1692767365 2023-02-15T13:43:32.062+0300 [DEBUG] agent.server.replication.acl.policy: acl replication: local=27 remote=27 2023-02-15T13:43:32.062+0300 [DEBUG] agent.server.replication.acl.policy: acl replication: deletions=0 updates=1 2023-02-15T13:43:32.067+0300 [DEBUG] agent.server.replication.acl.policy: acl replication - downloaded updates: amount=1 2023-02-15T13:43:32.067+0300 [DEBUG] agent.server.replication.acl.policy: acl replication - performing updates 2023-02-15T13:43:32.083+0300 [WARN] agent.server.replication.acl.policy: ACL replication error (will retry if still leader): error="failed to update local ACL policies: Failed to apply policy upserts: Changing the Rules for the builtin global-management policy is not permitted" ```