siderolabs / talos

Talos Linux is a modern Linux distribution built for Kubernetes.
https://www.talos.dev
Mozilla Public License 2.0
6.58k stars 524 forks source link

Publish IBM Cloud VPC Gen2 Image #6637

Closed FischerLGLN closed 2 months ago

FischerLGLN commented 1 year ago

Feature Request

Publishing an IBM Cloud compatible Image would allow us to use their GPUs via Edge Computing.

Description

I think the platform dependent code needs to be added here: https://github.com/siderolabs/talos/tree/main/internal/app/machined/pkg/runtime/v1alpha1/platform

I can provide docs instructions how to create LoadBalancer and Server instances via CLI. Since I have an IBM Cloud Account, how can I build an image for this platform? Thanks!

FischerLGLN commented 1 year ago

Similar Issue: https://github.com/siderolabs/talos/issues/6638

smira commented 1 year ago

In general, Talos platform is required if the cloud provides additional metadata that Talos could use - e.g. user metadata, network configuration, etc.

Generic metal should still boot, but it won't have any specific support for the IBM Cloud.

Developing and supporting a new platform is a significant amount of work and testing to make this possible.

FischerLGLN commented 1 year ago

In general, Talos platform is required if the cloud provides additional metadata that Talos could use - e.g. user metadata, network configuration, etc.

Generic metal should still boot, but it won't have any specific support for the IBM Cloud.

Developing and supporting a new platform is a significant amount of work and testing to make this possible.

Thanks! Using the metal image in a VM environment leads to a DHCP error: renew failed ... dhcp4 error ... no matching response packet received ... link: "eth0"

The network interface in the IBM VM has the name primary instead of eth0.

@smira Can I change that in worker.yaml and set DHCP to false, since I already get a IP assigned through VM?

network:
        # Configures KubeSpan feature.
        kubespan:
            enabled: true # Enable the KubeSpan feature.

        # # `interfaces` is used to define the network interface configuration.
        # interfaces:
        #     - interface: primary
        #       dhcp: false
FischerLGLN commented 1 year ago

Same error as before using command

ibmcloud is instance-create $INSTANCE $VPC $ZONE_NAME $PROFILE_NAME $SUBNET \
--image $IMAGE --resource-group-name $RESOURCE_GROUP_NAME --keys $KEYS --sgs $SGS --user-data @worker.yaml

Maybe it isn't picking up my worker.yaml correctly, since it isn't using the specified timeserver

smira commented 1 year ago

I have no details about IBM Cloud, but if there is no DHCP, the machine should have static network configuration: IP, routes, DNS, NTP, etc.

Talos won't load --user-data, as it has no support for IBM Cloud. If IBM Cloud implements some other APIs, e.g. OpenStack, or AWS, another image might work.

FischerLGLN commented 1 year ago

@smira Ah, okay so that is the reason. So, ntp is working, but DHCP not because of missing eth0. I can attach a network at creation time with another private IP and device name eth0. Will try out now. But without cluster.controlPlane.endpoint set, how to tell the node to connect to the remote controlplane?

smira commented 1 year ago

Device name in Linux and in cloud portal might be totally different.

sergelogvinov commented 1 year ago

If I not mistaken, IBM Cloud uses Openstack. Try Openstack platform...

FischerLGLN commented 1 year ago

@sergelogvinov I already tried that with the Openstack image. The OS had trouble to find the Openstack Public IP Configuration Endpoint (DHCP error), since the endpoint is missing in IBM Cloud.

This is the missing endpoint:

https://github.com/siderolabs/talos/blob/06fea244140e82fd30a4ac4c5e4433253bd930ab/internal/app/machined/pkg/runtime/v1alpha1/platform/openstack/metadata.go#L34

sergelogvinov commented 1 year ago

I can provide docs instructions how to create LoadBalancer and Server instances via CLI.

@FischerLGLN Can you send me your infrastructure setup (email/slack/here)? I will check it in nearest future...

FischerLGLN commented 1 year ago

@sergelogvinov Short answer for now; more details tomorrow if you like. Maybe you don't need the whole setup, because it hangs already at boot for the controlplane image.

Convert image to supported qcow2

qemu-img convert -f raw -O qcow2 disk.raw talos.qcow2

Upload image in a bucket with IAM rights to create a VPC custom image After that:

The sgs is all ports ingress egress full open.

Creation of security groups

ibmcloud is security-group-create k3s-test myvpc --resource-group-name myorg
ibmcloud is security-group-rule-add k3s-test

ibmcloud is security-group-rule-add k3s-test inbound all --remote k3s-test --vpc myvpc

ibmcloud is security-group-rule-add k3s-test outbound all --vpc myvpc

Create 2 servers in IBM VPC, one time with @control_plane.yaml & @worker.yaml You don't need the ssh keys for talos

ibmcloud is instance-create $INSTANCE $VPC $ZONE_NAME $PROFILE_NAME $SUBNET --image $IMAGE --resource-group-name $RESOURCE_GROUP_NAME --keys $KEYS --sgs $SGS --user-data @worker.yaml

The image is the talos custom_image id, which loads the real qcow2 image via IBM COS bucket

Adding an existing floating ip

ibmcloud is floating-ip-update floating_ip_id --nic primary --in $INSTANCE_NAME
github-actions[bot] commented 3 months ago

This issue is stale because it has been open 180 days with no activity. Remove stale label or comment or this will be closed in 7 days.

github-actions[bot] commented 2 months ago

This issue was closed because it has been stalled for 7 days with no activity.