Open magicite opened 2 years ago
@Unix4ever any ideas?
A few updates in case it's helpful.
talosctl cluster create
using docker, and on a baremetal talos cluster. I've only seen this issue with the docker cluster.Here's how I'm creating the docker management cluster:
talosctl cluster create \
--name bootstrap \
--kubernetes-version 1.24.3 \
-p 69:69/udp,8081:8081/tcp,51821:51821/udp \
--memory 4096 \
--workers 0 \
--nameservers 16.110.135.51,16.110.135.52 \
--registry-mirror docker.io=http://dill04.us.cray.com:2022 \
--registry-mirror k8s.gcr.io=http://dill04.us.cray.com:2023 \
--registry-mirror quay.io=http://dill04.us.cray.com:2024 \
--registry-mirror gcr.io=http://dill04.us.cray.com:2025 \
--registry-mirror ghcr.io=http://dill04.us.cray.com:2026 \
--registry-mirror registry.k8s.io=http://dill04.us.cray.com:2027 \
--with-cluster-discovery=false \
--config-patch @env.yaml \
--config-patch-control-plane @env.yaml \
--config-patch-worker @env.yaml \
--endpoint $HOST_IP
with the patch file being
- op: add
path: /machine/env
value:
http_proxy: xxx
https_proxy: xxx
no_proxy: xxx
- op: add
path: /cluster/allowSchedulingOnMasters
value: true
- op: add
path: /machine/time
value:
servers:
- 16.110.135.123
- 16.229.168.10
Are you using latest versions of the providers? We had some fixes since that time.
Yes - I am using the latest released versions of the providers.
[root@dill04 demo-1.2]# clusterctl --kubeconfig-context admin@bootstrap upgrade plan
Checking cert-manager version...
Cert-Manager is already up to date
Checking new release availability...
Latest release available for the v1beta1 API Version of Cluster API (contract):
NAME NAMESPACE TYPE CURRENT VERSION NEXT VERSION
bootstrap-talos cabpt-system BootstrapProvider v0.5.5 Already up to date
control-plane-talos cacppt-system ControlPlaneProvider v0.4.10 Already up to date
cluster-api capi-system CoreProvider v1.2.4 Already up to date
infrastructure-sidero sidero-system InfrastructureProvider v0.5.5 Already up to date
You are already up to date!
Might be the same as #109.
I'm creating a cluster on some old Dell R620s with sidero and am noticing that once the first control plane node gets to the point where it needs to receive the bootstrap request, it takes a variable amount of time to receive it. I've seen between 6 and 15 minutes. If I kill the cacppt-controller-manager pod, that seems to kickstart things.
I'm running the latest of everything.
Attached is the cacppt-controller-manager log. This log has two cluster provisions, with the first one not having the "takes a long time" issue, and the second one experiencing the issue. I think the interesting bits start at 1.658505243132774e+09 (last failed message). cacppt-controller-manager-delayed-bootstrap.txt cpn_console.txt