Closed MattAxel closed 1 year ago
Would you be able to provide the rke2 server args/config file that you used?
Are you using one of the pre-configured RKE2 CIS profiles?
Are you expecting external connectivity to be available for the Windows services?
Do you have your internal DNS servers (assuming you have at least one due to VMware vSphere) configured in the coredns config map?
Thanks for your reply.
From the RKE2 server one (/etc/rancher/rke2/config.yaml):
tls-san:
- safeperfkubl1
- safeperfkubl1.infra.local
- safeperfkubcl.infra.local
- 172.17.93.211
disable: rke2-ingress-nginx
cni:
- calico
No have not specified any CSI profile
Yes I expecting external connectivity on the windows services. And it works fine in a few pods. But cannot see any pattern more than it looks like it always works until a pod is set to ready. After that it only works in max on instance of each deployment type.
Resolving the names does not seem like a problem. Works fine even in pods without external connectivity.
{
"Corefile": ".:53 {
errors
health {
lameduck 5s
}
ready
kubernetes cluster.local cluster.local in-addr.arpa ip6.arpa {
pods insecure
fallthrough in-addr.arpa ip6.arpa
ttl 30
}
prometheus 0.0.0.0:9153
forward . /etc/resolv.conf
cache 30
loop
reload
loadbalance
}"
}
Guess it forwards to /etc/resolv.conf and uses 127.0.0.53 in that file. Systemd resolved.. But changed to:
{
...
forward . 172.17.93.2
...
}
(Did not make any difference unfortunately)
Created a new cluster with one control plane node and two windows workers. One worker with win 2019 and one with 2022. Worked perfectly fine on the win 2019 and got the same issue described above on the win2022....
@MattAxel thanks for the update and the additional information.
Closing this due to age and inactivity.
Pods with services cannot reach outside the cluster network. Standalone pods are working fine. On windows nodes ,calico.
Environmental Info: RKE2 Version: rke2.exe version v1.22.5+rke2r1 (ce3e572376cbb1d8157f46e2ae29d7d7834067f1) go version go1.16.10b7
Node(s) CPU architecture, OS, and Version: Caption CSName Version BuildType OSArchitecture
Microsoft Windows Server 2022 Datacenter SAFEPERFKUBW1 10.0.20348 Multiprocessor Free 64-bit
Cluster Configuration: 3 ubuntu 20.04 servers, 2 Win agents. Calico cni plugin Running on vmware VSphere
Describe the bug: Pods on the windows nodes cannot partly reach out to external ipadresses. This only applies if a service is created for the deployments. If it is a pod without any service it works fine. Service type does not seem to matter. One instance of each deployment can most of the time reach out externally. For example 3 pods of the same deployment are running on the same node. Only the lastest created are able to reach out externally. This is not always the case but most of the times. When starting up a new pod it always works until the status changes to ready. Guess that is when kubeproxy are updated. Not sure what to look after in the kubeproxy logs but cannot find any errors... On the linux nodes it works perfectly fine.
Steps To Reproduce: Installed using quick start guide for RKE2 and calico cni since using windows agents. Exec a curl command in a pod
Expected behavior: Pods should always be able to reach external networks
Actual behavior: Pods are not able to access outside cluster network if there is a service connected to the deployment (Most of the times, see description)
Additional context / logs: