Closed shellfu closed 1 year ago
Hi @kong62,
The issue is expected to still be present in 1.14.2. It will be fixed as of the next set of patch releases (1.14.4, 1.13.6, and 1.12.9).
thank you
Can confirm that after upgrading to v1.14.4
I did not experience these messages during normal operation for one whole day now. Will continue monitoring.
Thanks will go ahead and close this issue as it looks like this is fixed for 1.14.4 and later. Note that you may continue to encounter some WARNs on agent startup and on very infrequent occasions. This is a related but separate issue https://github.com/hashicorp/consul/issues/15821
I've installed 1.12.9 and I can still see this messages. Not that frequent but still
2023-02-06T05:50:55.844+0100 [WARN] agent: [core]grpc: addrConn.createTransport failed to connect to {dc1-10.192.0.5:8300 consul-3 <nil> 0 <nil>}. Err: connection error: desc = "transport: Error while dialing dial tcp <nil>->10.192.0.5:8300: operation was canceled". Reconnecting...
2023-02-06T05:57:43.781+0100 [WARN] agent: [core]grpc: addrConn.createTransport failed to connect to {dc1-10.192.0.3:8300 consul-1.dc1 <nil> 0 <nil>}. Err: connection error: desc = "transport: Error while dialing dial tcp <nil>->10.192.0.3:8300: operation was canceled". Reconnecting...
2023-02-06T05:58:10.466+0100 [WARN] agent: [core]grpc: addrConn.createTransport failed to connect to {dc1-10.192.0.4:8300 consul-2.dc1 <nil> 0 <nil>}. Err: connection error: desc = "transport: Error while dialing dial tcp <nil>->10.192.0.4:8300: operation was canceled". Reconnecting...
2023-02-06T06:50:09.474+0100 [WARN] agent: [core]grpc: addrConn.createTransport failed to connect to {dc1-10.192.0.4:8300 consul-2.dc1 <nil> 0 <nil>}. Err: connection error: desc = "transport: Error while dialing dial tcp <nil>->10.192.0.4:8300: operation was canceled". Reconnecting...
2023-02-06T07:13:16.970+0100 [WARN] agent: [core]grpc: addrConn.createTransport failed to connect to {dc1-10.192.0.4:8300 consul-2.dc1 <nil> 0 <nil>}. Err: connection error: desc = "transport: Error while dialing dial tcp <nil>->10.192.0.4:8300: operation was canceled". Reconnecting...
2023-02-06T07:18:54.830+0100 [WARN] agent: [core]grpc: addrConn.createTransport failed to connect to {dc1-10.192.0.4:8300 consul-2.dc1 <nil> 0 <nil>}. Err: connection error: desc = "transport: Error while dialing dial tcp <nil>->10.192.0.4:8300: operation was canceled". Reconnecting...
2023-02-06T07:36:53.408+0100 [WARN] agent: [core]grpc: addrConn.createTransport failed to connect to {dc1-10.192.0.5:8300 consul-3 <nil> 0 <nil>}. Err: connection error: desc = "transport: Error while dialing dial tcp <nil>->10.192.0.5:8300: operation was canceled". Reconnecting...
v1.14.4 This one is still reproducible:
agent: [core][Channel #1 SubChannel #21] grpc: addrConn.createTransport failed to connect to {
"Addr": "xx.xx.xx.xx",
"ServerName": "xx.xx.xx.xx",
"Attributes": null,
"BalancerAttributes": null,
"Type": 0,
"Metadata": null
}. Err: connection error: desc = "transport: Error while dialing dial tcp
I have same problem on my Consul Cluster v1.14.4
[WARN] agent: [core][Channel #1 SubChannel #15] grpc: addrConn.createTransport failed to connect to {
"Addr": "xx.xx.xx.xx",
"ServerName": "xx.xx",
"Attributes": null,
"BalancerAttributes": null,
"Type": 0,
"Metadata": null
}. Err: connection error: desc = "transport: Error while dialing dial tcp
@urosgruber @webant @mexdevops Do those warns occur regularly? We still expect to see those logs as an agent starts up but they should become infrequent over time. https://github.com/hashicorp/consul/issues/15821 for reference
@urosgruber @webant @mexdevops Do those warns occur regularly? We still expect to see those logs as an agent starts up but they should become infrequent over time. https://github.com/hashicorp/consul/issues/15821 for reference
Yep, i have this messages from time to time and i can't fix this. Also, i don't know anything about this problem...
It appears every 2-3 days for each agent. Not related to any startup, quite constant flow of those with several hundreds of agents with ~100 of those errors every day.
@urosgruber @webant @mexdevops Do those warns occur regularly? We still expect to see those logs as an agent starts up but they should become infrequent over time. #15821 for reference
With 3 servers, no agents connected, fresh install about 10 messages a day on random.
Could you provide log fragments with lines before and after you see these WARNs?
I'd like to confirm that they are not caused by this issue (server shuffling every ~2 mins) and would like to move the investigation to either #15821 or a new issue if necessary.
Do you need INFO level? I believe there is nothing else in case I set it to WARN
Do you need INFO level? I believe there is nothing else in case I set it to WARN
Yes, INFO would be appreciated.
2023-02-22T11:14:22.693+0100 [INFO] agent: (LAN) joining: lan_addresses=[10.192.0.3, 10.192.0.4, 10.192.0.5]
2023-02-22T11:14:22.698+0100 [INFO] agent: started state syncer
2023-02-22T11:14:22.699+0100 [INFO] agent: Consul agent running!
2023-02-22T11:14:22.699+0100 [INFO] agent: (LAN) joined: number_of_nodes=3
2023-02-22T11:14:22.699+0100 [INFO] agent: Join cluster completed. Synced with initial agents: cluster=LAN num_agents=3
2023-02-22T11:14:22.720+0100 [WARN] agent.server.raft: failed to get previous log: previous-index=48302 last-index=48300 error="log not found"
2023-02-22T11:14:23.024+0100 [INFO] agent: Synced node info
2023-02-22T11:16:22.675+0100 [WARN] agent: [core]grpc: addrConn.createTransport failed to connect to {dc1-10.192.0.3:8300 consul-1 <nil> 0 <nil>}. Err: connection error: desc = "transport: Error while dialing dial tcp <nil>->10.192.0.3:8300: operation was canceled". Reconnecting...
2023-02-22T12:21:01.609+0100 [WARN] agent.server.raft: skipping application of old log: index=48795
2023-02-22T12:21:47.875+0100 [WARN] agent: [core]grpc: addrConn.createTransport failed to connect to {dc1-10.192.0.3:8300 consul-1 <nil> 0 <nil>}. Err: connection error: desc = "transport: Error while dialing dial tcp <nil>->10.192.0.3:8300: operation was canceled". Reconnecting...
2023-02-22T13:09:58.024+0100 [INFO] agent.fsm: snapshot created: duration=236.985µs
2023-02-22T13:09:58.025+0100 [INFO] agent.server.raft: starting snapshot up to: index=49157
2023-02-22T13:09:58.025+0100 [INFO] agent.server.snapshot: creating new snapshot: path=/var/db/consul/raft/snapshots/2-49157-1677067798025.tmp
2023-02-22T13:09:58.030+0100 [INFO] agent.server.snapshot: reaping snapshot: path=/var/db/consul/raft/snapshots/2-16387-1676800266639
2023-02-22T13:09:58.031+0100 [INFO] agent.server.raft: compacting logs: from=22534 to=38917
2023-02-22T13:09:58.056+0100 [INFO] agent.server.raft: snapshot complete up to: index=49157
2023-02-22T13:46:17.188+0100 [WARN] agent: [core]grpc: addrConn.createTransport failed to connect to {dc1-10.192.0.4:8300 consul-2.dc1 <nil> 0 <nil>}. Err: connection error: desc = "transport: Error while dialing dial tcp <nil>->10.192.0.4:8300: operation was canceled". Reconnecting...
2023-02-22T14:15:21.288+0100 [WARN] agent: [core]grpc: addrConn.createTransport failed to connect to {dc1-10.192.0.3:8300 consul-1 <nil> 0 <nil>}. Err: connection error: desc = "transport: Error while dialing dial tcp <nil>->10.192.0.3:8300: operation was canceled". Reconnecting...
2023-02-22T15:04:57.369+0100 [WARN] agent: [core]grpc: addrConn.createTransport failed to connect to {dc1-10.192.0.4:8300 consul-2.dc1 <nil> 0 <nil>}. Err: connection error: desc = "transport: Error while dialing dial tcp <nil>->10.192.0.4:8300: operation was canceled". Reconnecting...
From my side, no info logs around those for the whole cluster. A single agent impacted at a time.
can we confirm if this is fixed, it's marked as fixed but there are still reports in this thread of the problem
Hello everyone,
I created issue #17842 to track the problem which I consider separate from this one. Although they are related, it is helpful for us to triage the issue separately given that the WARN logs are no longer frequent or regular.
If you are still seeing WARN logs ending with operation was canceled
, please help us investigate by providing DEBUG logs in #17842
Note from @lkysow: I'm moving this to hashicorp/consul because the discuss post shows a user on EC2 also saw this error.
Overview of the Issue
New 1.10.0 on New K8s Cluster results in
[WARN] agent: grpc: addrConn.createTransport failed to connect to {10.200.65.16:8300 0 consul-server-2.primary <nil>}. Err :connection error: desc = "transport: Error while dialing dial tcp 10.200.65.16:8300: operation was canceled". Reconnecting...
These WARNS appear in both the server and clients.
Reproduction Steps
values.yml
:Expected behavior
WARNS should not be flooding the log and connections should be over 8301 not 8300
Environment details
If not already included, please provide the following:
consul-k8s
version:0 .26.0consul-helm
version: 0.32.1values.yaml
used to deploy the helm chart: see aboveAdditional Context
It seems others are experiencing the same problem. https://discuss.hashicorp.com/t/grpc-warning-on-consul-1-10-0/26237