Closed iblancasa closed 3 months ago
I can provide more information but I'm not sure where to look.
Everything works properly when I start cloud-provider-kind. When I start istio version 1.22.0 with istioctl install -y, the system freezes. After that, I have to reboot.
... does it freeze without cloud provider kind? this seems like an istio <> kind issue rather than cloud-provider-knid
The Istio documentation for kind points to Cloud Provider KIND to setup MetalLB.The Istio documentation for kind points to Cloud Provider KIND to setup MetalLB.
The pointer here links to a page in the kind docs that used to have a mettalb install but currently covers cloud-provider-kind, the istio docs should be corrected, if metalb is still desired something other link will need to be used, if not we should drop the reference to metallb.
cc @howardjohn
I think @craigbox or @danehans was looking into ^ already
... does it freeze without cloud provider kind? this seems like an istio <> kind issue rather than cloud-provider-knid
It doesn't freeze if I'm just using kind
+ cloud provider kind
. As soon as Istio is started... everything freezes.
If I use Istio + kind (no cloud provider kind
) everything goes well.
It only fails if I have kind
+ cloud provider kind
+ Istio.
I tried:
cloud provider kind
, start Istiocloud provider kind
Same result.
The pointer here links to a page in the kind docs that used to have a mettalb install but currently covers cloud-provider-kind, the istio docs should be corrected, if metalb is still desired something other link will need to be used, if not we should drop the reference to metallb.
I reached that website from this: https://istio.io/latest/docs/tasks/traffic-management/ingress/ingress-control/.
I can see how CLUSTER-IP
is <pending>
when running kubectl get svc "$INGRESS_NAME" -n "$INGRESS_NS"
. I'm running that command in a different terminal using watch
. When I start cloud-provider-kind
I can see how the status goes from <pending>
to an IP. But later... everything freezes.
I was planning to use kind
for a workshop but I'll need to switch to minikube
or something else.
Anyway, let me know if there is any extra information I can provide to help fix the issue.
Thanks!
Can you elaborate on "everything freezes"?
cloud-provider-kind
runs on your local machineIf your host system freezes and has to reboot, then it sounds like either a bug in Docker (unlikely) or a bug in cloud-provider-kind
. I don't immediately see how it could be a bug in kind or Istio.
Installing Istio (or to be clear, a Gateway
) is probably the first time that cloud-provider-kind
sees a LoadBalancer
that it needs to provision.
(It just so happens I tried cloud-provider-kind
last week and it worked for me; yes, we are still referring to old MetalLB docs which were changed from underneath us, and I'll update the links next week.)
Can you elaborate on "everything freezes"?
I can do nothing with my machine. The system becomes unresponsive. The only thing I can do is to push the power button and wait until the laptop is forced to shut down.
If your host system freezes and has to reboot, then it sounds like either a bug in Docker (unlikely) or a bug in
cloud-provider-kind
. I don't immediately see how it could be a bug in kind or Istio.
I don't think it is a bug in Istio nor kind but it was the combination where I experienced the issue.
This is the information from my docker environment:
$ docker version
Client: Docker Engine - Community
Version: 26.1.3
API version: 1.45
Go version: go1.21.10
Git commit: b72abbb
Built: Thu May 16 08:35:25 2024
OS/Arch: linux/amd64
Context: default
Server: Docker Engine - Community
Engine:
Version: 26.1.3
API version: 1.45 (minimum version 1.24)
Go version: go1.21.10
Git commit: 8e96db1
Built: Thu May 16 08:33:42 2024
OS/Arch: linux/amd64
Experimental: false
containerd:
Version: 1.6.31
GitCommit: e377cd56a71523140ca6ae87e30244719194a521
runc:
Version: 1.1.12
GitCommit: v1.1.12-0-g51d5e94
docker-init:
Version: 0.19.0
GitCommit: de40ad0
Can you log the cloud-provider-kind
stdout to a file so you can get it after the reboot and attach it here?
I0530 13:24:40.077190 10025 controller.go:167] probe HTTP address https://127.0.0.1:41009
I0530 13:24:40.079794 10025 controller.go:88] Creating new cloud provider for cluster workshop
I0530 13:24:40.084009 10025 controller.go:95] Starting cloud controller for cluster workshop
I0530 13:24:40.084023 10025 node_controller.go:165] Sending events to api server.
I0530 13:24:40.084187 10025 controller.go:231] Starting service controller
I0530 13:24:40.084239 10025 shared_informer.go:311] Waiting for caches to sync for service
I0530 13:24:40.084440 10025 node_controller.go:174] Waiting for informer caches to sync
I0530 13:24:40.085664 10025 reflector.go:351] Caches populated for *v1.Node from k8s.io/client-go/informers/factory.go:159
I0530 13:24:40.085719 10025 reflector.go:351] Caches populated for *v1.Service from k8s.io/client-go/informers/factory.go:159
I0530 13:24:40.184494 10025 shared_informer.go:318] Caches are synced for service
I0530 13:24:40.184581 10025 controller.go:733] Syncing backends for all LB services.
I0530 13:24:40.184601 10025 controller.go:737] Successfully updated 0 out of 0 load balancers to direct traffic to the updated set of nodes
I0530 13:24:40.184613 10025 instances.go:47] Check instance metadata for workshop-control-plane
I0530 13:24:40.184710 10025 controller.go:398] Ensuring load balancer for service istio-system/istio-ingressgateway
I0530 13:24:40.184794 10025 controller.go:954] Adding finalizer to service istio-system/istio-ingressgateway
I0530 13:24:40.184955 10025 event.go:376] "Event occurred" object="istio-system/istio-ingressgateway" fieldPath="" kind="Service" apiVersion="v1" type="Normal" reason="EnsuringLoadBalancer" message="Ensuring load balancer"
I0530 13:24:40.203753 10025 loadbalancer.go:28] Ensure LoadBalancer cluster: workshop service: istio-ingressgateway
I0530 13:24:40.204765 10025 instances.go:75] instance metadata for workshop-control-plane: &cloudprovider.InstanceMetadata{ProviderID:"kind://workshop/kind/workshop-control-plane", InstanceType:"kind-node", NodeAddresses:[]v1.NodeAddress{v1.NodeAddress{Type:"Hostname", Address:"workshop-control-plane"}, v1.NodeAddress{Type:"InternalIP", Address:"172.18.0.3"}, v1.NodeAddress{Type:"InternalIP", Address:"fc00:f853:ccd:e793::3"}}, Zone:"", Region:""}
I0530 13:24:40.217251 10025 node_controller.go:267] Update 1 nodes status took 32.689405ms.
I0530 13:24:40.222627 10025 server.go:100] updating loadbalancer
I0530 13:24:40.222650 10025 proxy.go:126] address type Hostname, only InternalIP supported
I0530 13:24:40.222660 10025 proxy.go:126] address type Hostname, only InternalIP supported
I0530 13:24:40.222664 10025 proxy.go:126] address type Hostname, only InternalIP supported
I0530 13:24:40.222667 10025 proxy.go:140] haproxy config info: &{HealthCheckPort:10256 ServicePorts:map[IPv4_15021:{BindAddress:*:15021 Backends:map[workshop-control-plane:172.18.0.3:31025]} IPv4_443:{BindAddress:*:443 Backends:map[workshop-control-plane:172.18.0.3:31055]} IPv4_80:{BindAddress:*:80 Backends:map[workshop-control-plane:172.18.0.3:31209]}]}
I0530 13:24:40.222744 10025 proxy.go:155] updating loadbalancer with config
global
log /dev/log local0
log /dev/log local1 notice
daemon
resolvers docker
nameserver dns 127.0.0.11:53
defaults
log global
mode tcp
option dontlognull
# TODO: tune these
timeout connect 5000
timeout client 50000
timeout server 50000
# allow to boot despite dns don't resolve backends
default-server init-addr none
frontend IPv4_15021-frontend
bind *:15021
default_backend IPv4_15021-backend
# reject connections if all backends are down
tcp-request connection reject if { nbsrv(IPv4_15021-backend) lt 1 }
backend IPv4_15021-backend
option httpchk GET /healthz
server workshop-control-plane 172.18.0.3:31025 check port 10256 inter 5s fall 3 rise 1
frontend IPv4_443-frontend
bind *:443
default_backend IPv4_443-backend
# reject connections if all backends are down
tcp-request connection reject if { nbsrv(IPv4_443-backend) lt 1 }
backend IPv4_443-backend
option httpchk GET /healthz
server workshop-control-plane 172.18.0.3:31055 check port 10256 inter 5s fall 3 rise 1
frontend IPv4_80-frontend
bind *:80
default_backend IPv4_80-backend
# reject connections if all backends are down
tcp-request connection reject if { nbsrv(IPv4_80-backend) lt 1 }
backend IPv4_80-backend
option httpchk GET /healthz
server workshop-control-plane 172.18.0.3:31209 check port 10256 inter 5s fall 3 rise 1
I0530 13:24:40.261097 10025 proxy.go:163] restarting loadbalancer
I0530 13:24:40.287411 10025 server.go:116] get loadbalancer status
I0530 13:24:40.296570 10025 controller.go:995] Patching status for service istio-system/istio-ingressgateway
I0530 13:24:40.296641 10025 event.go:376] "Event occurred" object="istio-system/istio-ingressgateway" fieldPath="" kind="Service" apiVersion="v1" type="Normal" reason="EnsuredLoadBalancer" message="Ensured load balancer"
it will be good if you can run top
or htop
in parallel to see if some process has a problem and consumes all the CPU causing the freeze, for more advanced diagnostics you can follow these guidelines https://en.wikibooks.org/wiki/Linux_Guide/Freezes
@aojea I tried 4 times. The haproxy
process appears and takes all the CPU.
yes, we are still referring to old MetalLB docs which were changed from underneath us, and I'll update the links next week
yeah, sorry about that, more of an aside that we should follow up as I just noticed this impact.
The haproxy process appears and takes all the CPU.
... interesting, can you try with the latest code:
~go install sigs.k8s.io/cloud-provider-kind@main
(not @latest
which is the most recent tagged release)~
EDIT: there's now a release https://github.com/kubernetes-sigs/cloud-provider-kind/issues/78#issuecomment-2140441178(so go install sigs.k8s.io/cloud-provider-kind@latest
or any of the other documented methods for currently installing)
since ~the last release~ v0.1.0 @aojea switched from haproxy to envoy amongst other changes ...
We have to cut a new release with envoy and UDP support
@iblancasa can you try with the new release https://github.com/kubernetes-sigs/cloud-provider-kind/releases/tag/v0.2.0?
It works fine! Thanks a lot :)
It works fine! Thanks a lot :)
Thanks for the feedback
The Istio documentation for kind points to
Cloud Provider KIND
to setupMetalLB
.I'm using
kind
0.23.0
andcloud-provider-kind
0.1.0
.When I start a
kind
cluster with this configuration:Everything works properly when I start
cloud-provider-kind
. When I startistio
version1.22.0
withistioctl install -y
, the system freezes. After that, I have to reboot.I'm running
Fedora release 39 (Thirty Nine)
.