siderolabs / talos

Talos Linux is a modern Linux distribution built for Kubernetes.
https://www.talos.dev
Mozilla Public License 2.0
5.98k stars 484 forks source link

Creating a cluster via the CLI (yc) on Yandex. #8893

Open remotejob opened 1 month ago

remotejob commented 1 month ago

Feature Request

Creating a cluster via the CLI (yc) on Yandex Cloud.

Description

I try to create it on Yandex using https://www.talos.dev/v1.7/talos-guides/install/cloud-platforms/hetzner/ as a base but unsuccessful using nocloud-amd64.raw and hcloud-amd64.raw.
In all cases I have error
transport: Error while dialing: dial tcp 51.250.67.177:50000: connect: connection refused

smira commented 1 month ago

We don't know much about Yandex Cloud and what it takes to run Talos there.

Talos metal image should run everywhere, but if YC requires some special setup and handling, it would require some platform support from Talos Linux.

Either way, you should start by looking into the server logs to see why it fails or doesn't fail to boot. Usually these are called "serial console logs".

remotejob commented 1 month ago

I try to use https://kevinholditch.co.uk/2023/10/21/creating-a-kubernetes-cluster-using-talos-linux-on-xen-orchestra as base

unset TALOSCONFIG export CONTROL_PLANE_IP=158.160.117.159 talosctl gen config talos-k8s-yandex https://$CONTROL_PLANE_IP --with-docs=false --with-examples=false --output-dir _out export TALOSCONFIG="_out/talosconfig" talosctl config endpoint $CONTROL_PLANE_IP talosctl config node $CONTROL_PLANE_IP

Till what point all looks OK but talosctl bootstrap DON'T pass

failed to verify certificate: x509: certificate is valid for 10.128.0.27, 127.0.0.1, ::1, not 158.160.117.159"

talosctl disks --insecure --nodes $CONTROL_PLANE_IP

50000/tcp open ibm-db2

/dev/vda - fhmjk542a09ltf3d228q HDD - - virtio:d00000002v00001AF4 - 11 GB /pci0000:80/0000:80:00.0/0000:81:00.0/virtio2/ /sys/class/block

smira commented 1 month ago

Please use proper Markdown formatting to make your comments more readable.

In case there's an LB/IP Talos has no idea about, add that public IP to .machine.certSANs in the machine config.

remotejob commented 1 month ago

I am stuck on: 58.160.168.148:443: i/o timeout" ?? port 443?? my loadbalances 58.160.168.148:6443

946.684292] [talos] kubernetes endpoint watch error {"component": "controller-runtime", "controller": "k8s.EndpointController", "error": "failed to list *v1.Endpoints: Get \"https://158.160.168.148/api/v1/namespaces/default/endpoints?fieldSelector=metadata.name%3Dkubernetes&limit=500&resourceVersion=0\": dial tcp 158.160.168.148:443: i/o timeout"} [ 950.583089] [talos] controller failed {"component": "controller-runtime", "controller": "k8s.KubeletStaticPodController", "error": "error refreshing pod status: error fetching pod status: an error on the server (\"Authorization error (user=apiserver-kubelet-client, verb=get, resource=nodes, subresource=proxy)\") has prevented the request from succeeding"} [ 953.669929] [talos] task startAllServices (1/1): service "etcd" to be "up" [ 956.076398] [talos] etcd is waiting to join the cluster, if this node is the first node in the cluster, please runtalosctl bootstrapagainst one of the following IPs: [ 956.078744] [talos] [10.128.0.14] [ 966.101260] [talos] controller failed {"component": "controller-runtime", "controller": "k8s.KubeletStaticPodController", "error": "error refreshing pod status: error fetching pod status: an error on the server (\"Authorization error (user=apiserver-kubelet-client, verb=get, resource=nodes, subresource=proxy)\") has prevented the request from succeeding"} [ 968.668718] [talos] task startAllServices (1/1): service "etcd" to be "up" [ 978.438453] [talos] controller failed {"component": "controller-runtime", "controller": "k8s.NodeApplyController", "error": "1 error(s) occurred:\n\ttimeout"} [ 981.777055] [talos] controller failed {"component": "controller-runtime", "controller": "k8s.KubeletStaticPodController", "error": "error refreshing pod status: error fetching pod status: an error on the server (\"Authorization error (user=apiserver-kubelet-client, verb=get, resource=nodes, subresource=proxy)\") has prevented the request from succeeding"} [ 983.669451] [talos] task startAllServices (1/1): service "etcd" to b

remotejob commented 1 month ago

OK "add that public IP to .machine.certSANs" resolved the issue!!

Great job!!

smira commented 1 month ago

58.160.168.148:443: i/o timeout" ?? port 443??

you specify it yourself with talosctl gen config argument, so Talos uses whatever you specify.

remotejob commented 1 month ago

Yes. It was my TUPO. Now it looks all working. Very interesting approach and in general very interesting project. Thank. PS. I planing use it in production.