Closed esiperu-est closed 2 years ago
/sig node /sig network
/sig cluster-lifecycle
The address specified to kubelet seems to be lost in the description in this issue.
Anyway, I tested in my dual-stack cluster with default family ipv6 an I could not reproduce the problem, log/exec works;
# kubectl version --short
Client Version: v1.23.3
Server Version: v1.23.3
# kubectl get svc kubernetes # (kubernetes service is ipv6)
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes ClusterIP fd00:4000::1 <none> 443/TCP 6m36s
# netstat -putln | grep 10250
tcp6 0 0 1000::1:c0a8:102:10250 :::* LISTEN 391/kubelet
# ps ...
391 root 1603m S kubelet --address=1000::1:192.168.1.2 --container-runtime=remote --container-runtime-endpoint=unix:///var/run/crio/crio.sock --image-service-endpoint=unix:///var/run/crio/crio.sock --node-ip=1000::1:192.168.1.2,192.168.1.2 --register-node=true --kubeconfig /etc/kubernetes/kubeconfig.token --feature-gates IPv6DualStack=true --cluster-dns=1000::1:192.168.1.2 --cluster-domain=xcluster --runtime-cgroups=/systemd/system.slice --kubelet-cgroups=/systemd/system.slice
# kubectl logs mserver-4xdn4 # (works)
Container starting...
2022/03/13 17:48:38 Listen on address; [::]:5003
2022/03/13 17:48:38 Listen on address; [::]:5001
2022/03/13 17:48:38 Listen on UDP address; [::]:5001
2022/03/13 17:48:38 Listen on UDP address; [::]:5003
Start; sctpt server --addr=11.0.1.3,1100::103 --log=5 --port=6000
Executing [tail -f /dev/null]
# kubectl exec mserver-4xdn4 -- ifconfig # (works)
eth0 Link encap:Ethernet HWaddr 26:FC:8A:61:2D:49
inet addr:11.0.1.3 Bcast:11.0.1.255 Mask:255.255.255.0
inet6 addr: fe80::24fc:8aff:fe61:2d49/64 Scope:Link
inet6 addr: 1100::103/120 Scope:Global
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:57 errors:0 dropped:0 overruns:0 frame:0
TX packets:16 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:5042 (4.9 KiB) TX bytes:1304 (1.2 KiB)
...
Hello Lars,
There is one point which was missed to mention, kubelet shall be configured with --cloud-provider=external. With this flag set, we observe the failure in communication
Then this is not a K8s bug. My guess is that the cloud-provider doesn't set the node-ip's correct, or that the address you specify to kubelet is not a node-ip.
seems a duplicate of this https://github.com/kubernetes/kubernetes/issues/107743
Similar but in https://github.com/kubernetes/kubernetes/issues/107743 exec
worked, right?
That makes https://github.com/kubernetes/kubernetes/issues/107743 a harder problem. This issue seem to be a bug in the cloud-provider.
yeah, I was hoping we can find the pattern , seems neither both of us was able to reproduce it :shrug:
may be (?). cloud-provider setting ipv4 address first and ipv6 address (which is being used on node's INTERNAL-IP). During "kubectl exec command execution", kube-api takes first ip address from the LIST of INTERNAL-IP's. kube-api gets ipv4 address, and tries to connect using ipv4-address. Whereas kubelet is configured to listen on ipv6-INTERNAL-IP.
Is there such possibility. (from the initial look of the code, it looks like that.. may be my knowledge on kube-api code is 0. So I may be wrong)
the cloud providers are not consistent on IP assignment to nodes, it seems they are fixing it here https://github.com/kubernetes/kubernetes/pull/107750
maybe related?
what is this cloud provider? maybe this is specific to only one cloud provider
Nah, I don't think the order matter in this case. In my cluster;
# ps ...
396 root 1602m S kubelet --address=:: --container-runtime=remote --container-runtime-endpoint=unix:///var/run/crio/crio.sock --image-service-endpoint=unix:///var/run/crio/crio.sock --node-ip=192.168.1.2,1000::1:192.168.1.2 --register-node=true --kubeconfig /etc/kubernetes/kubeconfig.token --feature-gates IPv6DualStack=true --cluster-dns=1000::1:192.168.1.2 --cluster-domain=xcluster --runtime-cgroups=/systemd/system.slice --kubelet-cgroups=/systemd/system.slice
# kubectl get node vm-002 -o json | jq .status.addresses
[
{
"address": "192.168.1.2",
"type": "InternalIP"
},
{
"address": "1000::1:c0a8:102",
"type": "InternalIP"
},
{
"address": "vm-002",
"type": "Hostname"
}
]
IPv4 is before IPv6 in both --node-ip
and in status for the node and it still works.
In our scenario, I see --node-ip:
PFA snippet.
(may be not, from initial look.. taking a deeper look at code)
From the "ps" above --node-ip=192.168.1.2,1000::1:192.168.1.2
. "192.168.1.2" is ipv4
One more observation: If we configure metric-server with : "--kubelet-preferred-address-types" with HOSTNAME only ; kubectl exec/log works fine (o --kubelet-preferred-address-types=Hostname).
Here is output which shows cloud provided assigned ipv6address (if I am not wrong):
eccd@worker-pool1-f0a48xt3-santosh-eisnans-stack01:~> ss -a | grep 10250 **tcp LISTEN 0 4096 [dead::2]:10250 [::]:***
eccd@worker-pool1-f0a48xt3-santosh-eisnans-stack01:~> ps -lef | grep kubelet 4 S root 6458 6327 0 80 0 - 179194 - Feb22 ? 00:00:50 /csi-node-driver-registrar --v=4 --csi-address=/csi/csi.sock --kubelet-registration-path=/var/lib/kubelet/plugins/cinder.csi.openstack.org/csi.sock 4 S sles 24842 24709 0 80 0 - 373565 - Feb22 ? 01:28:13 /metrics-server --secure-port=4443 --cert-dir=/tmp --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname --kubelet-use-node-status-port --metric-resolution=15s --tls-cert-file=/etc/tls-cert/tls.crt --tls-private-key-file=/etc/tls-cert/tls.key 4 S root 45745 1 2 80 0 - 505547 - Mar09 ? 02:42:42 /usr/local/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --config=/var/lib/kubelet/config.yaml --runtime-cgroups=/system.slice/containerd.service --cert-dir=/var/lib/kubelet/pki --container-runtime=remote --container-runtime-endpoint=unix:///run/containerd/containerd.sock --register-node --node-labels=ccd/version=2.21.0 --node-ip=dead::2 --cloud-provider=external --pod-infra-container-image=registry.eccd.local:5000/pause:3.6-1-a01d506b 0 S eccd 125494 124999 0 80 0 - 2170 - 12:21 pts/0 00:00:00 grep --color=auto kubelet
eccd@director-0-santosh-eisnans-stack01:~> kubectl describe nodes worker-pool1-f0a48xt3-santosh-eisnans-stack01
Name: worker-pool1-f0a48xt3-santosh-eisnans-stack01
Roles: worker
Labels: beta.kubernetes.io/arch=amd64
beta.kubernetes.io/instance-type=ccd.worker
beta.kubernetes.io/os=linux
ccd/version=2.21.0
failure-domain.beta.kubernetes.io/region=RegionOne
failure-domain.beta.kubernetes.io/zone=nova
kubernetes.io/arch=amd64
kubernetes.io/hostname=worker-pool1-f0a48xt3-santosh-eisnans-stack01
kubernetes.io/os=linux
node-index=1
node-pool=pool1
node-role.kubernetes.io/worker=
node.kubernetes.io/instance-type=ccd.worker
node.uuid=dbc10028-f56c-41d1-afe9-bfb20ed003f3
node.uuid_source=heat
topology.cinder.csi.openstack.org/zone=nova
topology.kubernetes.io/region=RegionOne
topology.kubernetes.io/zone=nova
Annotations: alpha.kubernetes.io/provided-node-ip: dead::2
csi.volume.kubernetes.io/nodeid: {"cinder.csi.openstack.org":"dbc10028-f56c-41d1-afe9-bfb20ed003f3"}
kubeadm.alpha.kubernetes.io/cri-socket: /var/run/dockershim.sock
node.alpha.kubernetes.io/ttl: 0
volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp: Tue, 22 Feb 2022 04:44:45 +0000
Taints:
NetworkUnavailable False Tue, 22 Feb 2022 04:45:02 +0000 Tue, 22 Feb 2022 04:45:02 +0000 CalicoIsUp Calico is running on this node MemoryPressure False Mon, 14 Mar 2022 12:20:51 +0000 Wed, 09 Mar 2022 14:22:23 +0000 KubeletHasSufficientMemory kubelet has sufficient memory available DiskPressure False Mon, 14 Mar 2022 12:20:51 +0000 Wed, 09 Mar 2022 14:22:23 +0000 KubeletHasNoDiskPressure kubelet has no disk pressure PIDPressure False Mon, 14 Mar 2022 12:20:51 +0000 Wed, 09 Mar 2022 14:22:23 +0000 KubeletHasSufficientPID kubelet has sufficient PID available Ready True Mon, 14 Mar 2022 12:20:51 +0000 Wed, 09 Mar 2022 14:22:23 +0000 KubeletReady kubelet is posting ready status. AppArmor enabled Addresses: InternalIP: 10.0.16.11 InternalIP: dead::2 Hostname: worker-pool1-f0a48xt3-santosh-eisnans-stack01 Capacity: cpu: 4 ephemeral-storage: 33507308Ki ...
eccd@director-0-santosh-eisnans-stack01:~> kubectl logs node-local-dns-dz2n7 -n kube-system Error from server: Get "https://10.0.16.17:10250/containerLogs/kube-system/node-local-dns-dz2n7/node-cache": dial tcp 10.0.16.17:10250: connect: connection refused
Looks like kubelet
is doing the right thing, it uses the address dead::2
as it's told. dead::2
may be a ligit ipv6 address but it looks to me that someone is trying to tell us that something is wrong.
Again, this does not seem to be a K8s problem. Please try to get help from openstack.
sure, will try to check with openstack team (by collecting more logs). If you have any clues/suggestions, related to : set of commands which can be used for debugging openstack, will start from there and go forward..
One more info, if you can share output of "describe node
I notice that in the describe output above (please post json, btw!) addresses are:
Addresses:
InternalIP: 10.0.16.11
InternalIP: dead::2
but kubelet is listening only on the second of these. I'm not 100% clear on the details here, but won't apiserver try to connect to the first one?
The problem we're trying to address in #107750 is actually multiple networks, but this look like the same issue to me. With that patch in place, cloud-provider would honour the provided-node-ip
annotation and put the IPv6 address first. I believe this would result in apiserver correctly connecting to the IPv6 address.
api server is connecting to first one (10.0.16.11). but kubelet is configured to listen on ipv6-address (dead::2:10250) using address parameter in kubelet configuration.
@esiperu-est If you are able to test with #107750 applied I'd be interested to know if it resolves your issue. I suspect it will.
I will not be able to give feedback soon, as the issue is observed in customer deployment (which I dont have direct access and it requires cloud-provider like openstack)
/triage accepted
Seems like https://github.com/kubernetes/kubernetes/pull/107750 will solve the issue with a bit of config on the user's part.
/assign mdbooth /assign uablrek
Sorry, I can't test with --cloud-provider=external or openstack so I unassign.
https://github.com/kubernetes/kubernetes/pull/107750 merged.
Given that, can we close this issue?
Closing; please re-open if 107750 doesn't solve the issue with a bit more config. /close
@dcbw: Closing this issue.
What happened?
in dual stack deployment with preferred-ipaddress-family: ipv6.
kubelet is configured to listen on address : 10250 using address configuration for kubelet.
commands like kubectl <exec/log/..> are failing with connection timeout,
What did you expect to happen?
kubectl commands should work fine.
How can we reproduce it (as minimally and precisely as possible)?
In dual-stack environment, with preferred-addressfamily: ipv6
Update the address: in the /var/lib/kubelet/config.yaml
Restart the kubelet service.
ss -anlp | grep 10250
then execute kubectl exec/log commands.
Anything else we need to know?
Reason for configuring specific IP address for kubelet is to avoid security risk (listening on all interfaces might lead to attacks, due to secondary n/w interfaces).
Tried configuring kubelet with both ipv4/ipv6 addresses. But its not allowed, as from code: address parameter is expected to take only one argument.
Kubernetes version
$ kubectl version Client Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.0", GitCommit:"6abac1505370282a9583329dd35622792b60e449", GitTreeState:"clean", BuildDate:"2022-01-10T12:53:58Z", GoVersion:"go1.17.3", Compiler:"gc", Platform:"linux/amd64"} Server Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.0", GitCommit:"6abac1505370282a9583329dd35622792b60e449", GitTreeState:"clean", BuildDate:"2022-01-10T12:44:17Z", GoVersion:"go1.17.3", Compiler:"gc", Platform:"linux/amd64"}
Cloud provider
OS version
Install tools
Container runtime (CRI) and version (if applicable)
Related plugins (CNI, CSI, ...) and versions (if applicable)