Open xolott opened 1 week ago
It seems the stream is getting torn every few minutes. Can you tell how often that happens, and whether that interval of time is consistent? Besides the log entries, is this causing changes in policy resources (Server, HTTPRoute, AuthorizationPolicy, etc) to not being detected?
The errors are logged every 5 or 10 minutes (either of those, not a range) for about an hour. Sometimes, all the resource's watchers fail around the same time. Sometimes, they take turns: HTTP Routes fail for one hour, then MeshTlsAuthentications fail for another hour.
The last log entry was six hours ago, though.
I don't directly use those resources; we only have the default resources created during installation. Any way to test this out?
What is the issue?
The policy controller fails when it tries to watch some k8s resources (I think all of them). There is not a single package dropped according to cilium (I used hubble to check this) but the controller says the connection was dropped. Using curl within the container I can make the same GET request to the API server and get a response, so the CNI is not dropping this connection.
How can it be reproduced?
Using terraform:
values-ha.yaml
Logs, error output, etc
Every few minutes, the policy controller logs:
Running it with a debug log level, I can see this:
output of
linkerd check -o short
Environment
Possible solution
No response
Additional context
I modified the policy controller container image by adding:
ls
,wget
,sh
andcurl
Would you like to work on fixing this bug?
None