hashicorp / consul

Consul is a distributed, highly available, and data center aware solution to connect and configure applications across dynamic, distributed infrastructure.
https://www.consul.io
Other
28.25k stars 4.41k forks source link

a probabilistic event that the tokens policy cannot be synchronized to secondary datacenter with consul v1.10.0 #14284

Open DawnOf1996 opened 2 years ago

DawnOf1996 commented 2 years ago

Overview of the Issue

there are two kubernetes clusters of version 1.19, with the consul to federate kubernetes clusters.

now, there's a problem with us. After add a policy for tokens in the primary datacenter with the WebUI, there is a possibility that the policy cannot be synchronized to the other datacenter. Is this a problem may be happens with Consul version 1.10.0.

Reproduction Steps

1.Add a policy in the primary datacenter with Consul UI. 2.We can query the record in the primary datacenter has been created. 3.After a few minutes the policy information also cannot be queried from another datacenter. This is a probabilistic event, which leads us to have no clue. We pasted the log fragments below and the consul chart at datacenter-values.yaml. I hope we can find the answer from you. Thank you.

Operating system and Environment details

Operating system: Red Hat Enterprise Linux CoreOS 46.82.202012051820-0 Consul: v1.10.0 Kubernetes: v1.19 Charts: v0.32.1

Log Fragments

Primary Datacenter

consul server logs:

2022-08-18 13:59:22 consul-server-1 xxx stdout [WARN]agent:grpc:addrConn.createTransport failed to connect to {xx.xx.xx.xx:8300 0 consul-server-3.master },Err:cinnection error:desc="transport:Error while dialing dial tcp xx.xx.xx.xx:8300:operation was canceled".Reconnecting... 2022-08-18 13:58:44 consul-server-1 xxx stdout [WARN]agent.server.catalog:no terminating-gateway or ingress-gateway associated with this gateway:gateway=terminating-gateway 2022-08-18 13:57:23 consul-server-3 xxx stdout [WARN]agent.server.catalog:no terminating-gateway or ingress-gateway associated with this gateway:gateway=terminating-gateway 2022-08-18 13:56:56 consul-server-3 xxx stdout [WARN]agent.server.catalog:no terminating-gateway or ingress-gateway associated with this gateway:gateway=ingress-gateway

Another Datacenter

consul server logs:

2022-08-18 13:57:29 consul-server-1 xxx stdout [ERROR]agent.server.memberlist.wan:memberlist:Failed to send ping:write tcp xx.xx.xx.xx:53050 -> xx.xx.xx.xx:8443:write:broken pipe 2022-08-18 13:56:56 consul-server-4 xxx stdout [WARN]agent.server.catalog:no terminating-gateway or ingress-gateway associated with this gateway:gateway=terminating-gateway 2022-08-18 13:56:24 consul-server-1 xxx stdout [WARN]agent.server.catalog:no terminating-gateway or ingress-gateway associated with this gateway:gateway=ingress-gateway 2022-08-18 13:56:14 consul-server-0 xxx stdout [ERROR]agent.server.memberlist.wan:memberlist:Failed to send ping:write tcp xx.xx.xx.xx:44652 -> xx.xx.xx.xx:8443:write:broken pipe

mikemorris commented 2 years ago

This looks like it might be a similar issue to https://github.com/hashicorp/consul/issues/10356 and https://github.com/hashicorp/consul/issues/9319, which was partially fixed in https://github.com/hashicorp/consul/pull/12307 and included in the v1.10.10 patch release.