Open ricottatosta opened 10 months ago
@ricottatosta
As I can see, you're trying to use Talos as a control-plane and bootstrap provider. It's recommended that you submit this issue in talos-control-plane-provider
Thanks for changing my submission with type 'support'. The issue involves cidata.iso created by capmox to pass configuration via cloud-init, and talos vm boot image that uses it at boot time. Or am I supposed to believe that Talos bootstrap and controlplane are involved in creating cidata.iso? Maybe the issue should be submitted in "talos os".
Hmmm,
CAPMOX Now only creates cidata
supported by Debian and Ubuntu distributions.
If it's related to supporting Talos Cloud-init, you're right.
I don't know how Talos load the cloud-init iso,
If you know something, I can help you add support for Talos.
As far as I know, Talos expects cloud-init datasource to be in "nocloud" format. In this format, network-config file can't start with "network:". Because of the presence of "network:", Talos doesn't find the top level key "version: 2" and assumes it to be "version: 0". I can do nothing, that's the way you build cidata.iso and depends on how cloud-init expects data to be arranged in distributions like Debian and Ubuntu. You should find a way, either arranging data in a way that is good for all situations or let the user choose what kind of datasource format to produce in cidata.iso. I know this means more effort. But I think Talos and CAPMOX are great together.
A couple of usefull links: Talos NoCloud Cloud-Init NoCloud datasource
Will check this out
This is the console output of Talos NoCloud booting image regarding the issue:
[talos] found config disk (cidata) at /dev/sr0
ISO 9660 Extensions: IEEE_P1282
[talos] fetching meta config from: cidata/meta-data
[talos] fetching network config from: cidata/network-config
[talos] fetching machine config from: cidata/user-data
[talos] restarting platform network config {"component": "controller-runtime", "controller": "network.PlatformConfigController", ..., "error": "network-config metadata version=0 is not supported"}
I will check this if it works with other netplans distros, and then I will implement it
I'm really looking forward to hearing good news from you.
I will test it on ubuntu but can you tested on Talos,
What am I supposed to test? Release 0.1.1? Does it contain any changes in the code involved in the issue?
Currently, we support only Netplan-based distros, We will try to add more support for other network-configs.
If someone wants to take effort and add this, it will be great.
OK. It was expected. Unfortunately I'm not good at programming in Go. And the solution I found out is not suitable for being integrated in your code. As my solution is quite simple (just delete a bunch of characters from a template string), I'll patch your code every time you release a new version. Thank you for your support.
We will try to support this soon.
I would like to share my experience about using CAPMOX with TALOS.
This is the template I use in pkg/cloudinit/network.go
:
version: 2
ethernets:
{{- range $index, $element := .NetworkConfigData }}
eth{{ $index }}:
match:
macaddress: {{ $element.MacAddress }}
dhcp4: false
addresses:
{{- if $element.IPAddress }}
- {{ $element.IPAddress }}
{{- end }}
{{- if $element.IPV6Address }}
- {{ $element.IPV6Address }}
{{- end }}
{{- if eq $index 0 }}
{{- if $element.Gateway }}
gateway4: {{ $element.Gateway }}
{{- end }}
{{- if $element.Gateway6 }}
gateway6: {{ $element.Gateway6 }}
{{- end }}
{{- if $element.DNSServers }}
nameservers:
addresses:
{{- range $element.DNSServers }}
- {{ . }}
{{- end -}}
{{- end -}}
{{- end -}}
{{- end -}}
TALOS dislikes defining static routes in place of gateways and perhaps even defining nameservers for each device. For the rest, it works like a charm.
Thanks for sharing, I guess the best solution for this is to support another version of network-config thats actually different from the netplan config.
We have released a new version.
I'm trying to make nocloud template string work like the netplan one as much as possible. But there is an issue. If I omit to define a gateway, the injector refuses to build the cloudinit image complaining it wants it. Is there a way to define only address and netmask for a network device?
Oh, no The gateway is required
I will create new issue, so we can take this as a feature to support multiple cloud-init network-config.
However, DHCP will be included in the next release, i don't know if that helps you.
I'm trying to make nocloud template string work like the netplan one as much as possible. But there is an issue. If I omit to define a gateway, the injector refuses to build the cloudinit image complaining it wants it. Is there a way to define only address and netmask for a network device?
One workaround I've found is to add an illegal gateway (169.254.255.254 for example). Cluster-api-provider-ipam-in-cluster will accept this, and netplan will ignore it when applying with a warning. I have not tested this with cloud-init network config v1.
I have been trying to get the talos bootstrap provider and proxmox infrastructure provider together the whole morning. Running into the network-config metadata version=0 is not supported issue... So glad to finally find this github issue about it, meaning that I probably did nothing wrong in my configuration :)
Has there been any progress on the topic since January? Could possibly some sort of variable be added that makes it generate the cidata in the nocloud format such that it is compatible with talos?
Unfortunately not. We don't use Talos and as such we can't commit to adding support for it. Patches welcome, of course, and we are more than happy to accept additional maintainers :-)
CAPMOX works well with TALOS. But it needs some patches to the CAPMOX's code. Furthermore, it is possible to make your cluster "elastic", even self-managed (without an external cluster that creates and manages it). All it needs to do is modifying network.go file located at pkg/cloudinit/ in the source code and rebuild docker image. If something ready to use is needed, there is a docker image at https://hub.docker.com/r/ricottatosta/cluster-api-provider-proxmox already patched for TALOS. After deploying CAPMOX, patch CAPMOX deployment manifest and let its container image point at ricottatosta/cluster-api-provider-proxmox:[tag]. Last version (0.3.0) has the following patch applied:
...
const (
/* network-config template. */
networkConfigTPl = `version: 2
renderer: networkd
ethernets:
{{- range $index, $element := .NetworkConfigData }}
eth{{ $index }}:
match:
macaddress: {{ $element.MacAddress }}
dhcp4: {{ if $element.DHCP4 }}true{{ else }}false{{ end }}
dhcp6: {{ if $element.DHCP6 }}true{{ else }}false{{ end }}
{{- if or (and (not $element.DHCP4) $element.IPAddress) (and (not $element.DHCP6) $element.IPV6Address) }}
addresses:
{{- if $element.IPAddress }}
- {{ $element.IPAddress }}
{{- end }}
{{- if $element.IPV6Address }}
- '{{ $element.IPV6Address }}'
{{- end }}
{{- if eq $index 0 }}
{{- if and $element.Gateway (not $element.DHCP4) }}
gateway4: {{ $element.Gateway }}
{{- end }}
{{- if and $element.Gateway6 (not $element.DHCP6) }}
gateway6: '{{ $element.Gateway6 }}'
{{- end }}
{{- if $element.DNSServers }}
nameservers:
addresses:
{{- range $element.DNSServers }}
- {{ . }}
{{- end -}}
{{- end -}}
{{- end -}}
{{- end -}}
{{- end -}}
{{- $vrf := 0 -}}
{{- range $index, $element := .NetworkConfigData }}
{{- if eq $element.Type "vrf" }}
{{- if eq $vrf 0 }}
vrfs:
{{- $vrf := 1 }}
{{- end }}
{{$element.Name}}:
table: {{ $element.Table }}
{{- if $element.Routes }}{{ template "routes" $element }}{{- end -}}
{{- if $element.FIBRules }}{{ template "rules" $element }}{{- end -}}
{{- if $element.Interfaces }}
interfaces:
{{- range $element.Interfaces }}
- {{ . }}
{{- end -}}
{{- end -}}
{{- end -}}
{{- end -}}
{{- define "rules" }}
routing-policy:
{{- range $index, $rule := .FIBRules }}
- {
{{- if $rule.To }} "to": "{{$rule.To}}", {{ end -}}
{{- if $rule.From }} "from": "{{$rule.From}}", {{ end -}}
{{- if $rule.Priority }} "priority": {{$rule.Priority}}, {{ end -}}
{{- if $rule.Table }} "table": {{$rule.Table}}, {{ end -}} }
{{- end }}
{{- end -}}
{{- define "routes" }}
routes:
{{- range $index, $route := .Routes }}
- {
{{- if $route.To }} "to": "{{$route.To}}", {{ end -}}
{{- if $route.Via }} "via": "{{$route.Via}}", {{ end -}}
{{- if $route.Metric }} "metric": {{$route.Metric}}, {{ end -}}
{{- if $route.Table }} "table": {{$route.Table}}, {{ end -}} }
{{- end }}
{{- end -}}
`
)
...
It's not tested against vrf. My use case is two ethernets, public and private. As mentioned earlier, it works like a charm.
CAPMOX works well with TALOS. But it needs some patches to the CAPMOX's code. Furthermore, it is possible to make your cluster "elastic", even self-managed (without an external cluster that creates and manages it). All it needs to do is modifying network.go file located at pkg/cloudinit/ in the source code and rebuild docker image. If something ready to use is needed, there is a docker image at https://hub.docker.com/r/ricottatosta/cluster-api-provider-proxmox already patched for TALOS. After deploying CAPMOX, patch CAPMOX deployment manifest and let its container image point at ricottatosta/cluster-api-provider-proxmox:[tag]. Last version (0.3.0) has the following patch applied:
... const ( /* network-config template. */ networkConfigTPl = `version: 2 renderer: networkd ethernets: {{- range $index, $element := .NetworkConfigData }} eth{{ $index }}: match: macaddress: {{ $element.MacAddress }} dhcp4: {{ if $element.DHCP4 }}true{{ else }}false{{ end }} dhcp6: {{ if $element.DHCP6 }}true{{ else }}false{{ end }} {{- if or (and (not $element.DHCP4) $element.IPAddress) (and (not $element.DHCP6) $element.IPV6Address) }} addresses: {{- if $element.IPAddress }} - {{ $element.IPAddress }} {{- end }} {{- if $element.IPV6Address }} - '{{ $element.IPV6Address }}' {{- end }} {{- if eq $index 0 }} {{- if and $element.Gateway (not $element.DHCP4) }} gateway4: {{ $element.Gateway }} {{- end }} {{- if and $element.Gateway6 (not $element.DHCP6) }} gateway6: '{{ $element.Gateway6 }}' {{- end }} {{- if $element.DNSServers }} nameservers: addresses: {{- range $element.DNSServers }} - {{ . }} {{- end -}} {{- end -}} {{- end -}} {{- end -}} {{- end -}} {{- $vrf := 0 -}} {{- range $index, $element := .NetworkConfigData }} {{- if eq $element.Type "vrf" }} {{- if eq $vrf 0 }} vrfs: {{- $vrf := 1 }} {{- end }} {{$element.Name}}: table: {{ $element.Table }} {{- if $element.Routes }}{{ template "routes" $element }}{{- end -}} {{- if $element.FIBRules }}{{ template "rules" $element }}{{- end -}} {{- if $element.Interfaces }} interfaces: {{- range $element.Interfaces }} - {{ . }} {{- end -}} {{- end -}} {{- end -}} {{- end -}} {{- define "rules" }} routing-policy: {{- range $index, $rule := .FIBRules }} - { {{- if $rule.To }} "to": "{{$rule.To}}", {{ end -}} {{- if $rule.From }} "from": "{{$rule.From}}", {{ end -}} {{- if $rule.Priority }} "priority": {{$rule.Priority}}, {{ end -}} {{- if $rule.Table }} "table": {{$rule.Table}}, {{ end -}} } {{- end }} {{- end -}} {{- define "routes" }} routes: {{- range $index, $route := .Routes }} - { {{- if $route.To }} "to": "{{$route.To}}", {{ end -}} {{- if $route.Via }} "via": "{{$route.Via}}", {{ end -}} {{- if $route.Metric }} "metric": {{$route.Metric}}, {{ end -}} {{- if $route.Table }} "table": {{$route.Table}}, {{ end -}} } {{- end }} {{- end -}} ` ) ...
It's not tested against vrf. My use case is two ethernets, public and private. As mentioned earlier, it works like a charm.
Thanks! This patch appears to be working. I am now getting past the point where it was complaining that: "network-config metadata version=0 is not supported" :)
On a related point, how are you setting up the initial network for the control plane using Talos and Proxmox? I am attempting to use the VIP solution built into Talos but it seems to not be working... If you don't mind it would be very nice to see an example of your TalosControlPlane and ProxmoxCluster objects
For networking I use Cilium without kube-proxy. For VIP I use kube-vip in BGP mode as daemonset. Following is what you asked for:
apiVersion: infrastructure.cluster.x-k8s.io/v1alpha1
kind: ProxmoxCluster
metadata:
name: k8s-test
namespace: k8s-test
spec:
allowedNodes:
- pve1
- pve2
- pve3
controlPlaneEndpoint:
host: 10.100.150.150 (my VIP)
port: 6443
dnsServers:
- 10.100.150.1
ipv4Config:
addresses:
- 10.100.150.151-10.100.150.159
gateway: 10.100.150.254
prefix: 24
apiVersion: controlplane.cluster.x-k8s.io/v1alpha3
kind: TalosControlPlane
metadata:
name: k8s-test-control-plane
namespace: k8s-test
spec:
controlPlaneConfig:
controlplane:
generateType: controlplane
talosVersion: v1.6.1
configPatches:
- op: add
path: /machine/network/extraHostEntries
value:
- ip: 127.0.0.1
aliases:
- kubernetes
infrastructureTemplate:
apiVersion: infrastructure.cluster.x-k8s.io/v1alpha1
kind: ProxmoxMachineTemplate
name: k8s-test-control-plane
replicas: 1
version: 1.28.3
@ricottatosta Thank you for this - can you submit it as a PR, please?
What steps did you take and what happened: Sorry I submit this as a bug, but maybe it isn't. When deploying a cluster with Talos as provider for bootstrap and controlplane, Talos' init process finds a cloud-init drive, but then complains about network-config file. Talos' error says: "network-config metadata version=0 is not supported", maybe because it starts with "network:". Is it supported? Reading the manual, it shouldn't.
cloud-init manual
What did you expect to happen: Maybe cluster deployment should generate a network-config file starting without a top level "network:".
Environment:
kubectl version
): 1.28.3/etc/os-release
): Talos 1.5.5