siderolabs / talos

Talos Linux is a modern Linux distribution built for Kubernetes.
https://www.talos.dev
Mozilla Public License 2.0
6.86k stars 549 forks source link

Stable and/or mainline release #9735

Open uhthomas opened 4 hours ago

uhthomas commented 4 hours ago

Feature Request

It would be great if Talos could provide a version based on the stable or mainline kernels instead of just LTS.

Description

New hardware is sometimes impossible to use with Talos because support for it is not available in the LTS kernels. The LTS kernels are sometimes a year behind and so it impedes compatibility with new hardware.

Most recently, the GPUs on new 14th gen Intel CPUs (i915 7d55) don't work because support was added in Kernel 6.8. LTS is currently 6.6.

I recognise it may be tricky to support multiple kernels at once, and may make it harder to focus, but it's something I've really wanted to see from Talos for a long time and it has often come up as a point of friction.

onedr0p commented 1 hour ago

I am running into this exact issue on a 14th Gen Intel CPU...

k8s-0: kern:    info: [2024-11-16T04:09:27.992549659Z]: i915 0000:00:02.0: Your graphics device 7d55 is not properly supported by i915 in this
kernel version. To force driver probe anyway, use i915.force_probe=7d55
module parameter or CONFIG_DRM_I915_FORCE_PROBE=7d55 configuration option,
or (recommended) check for kernel updates.

As per the workaround, I've tried

machine:
  kernel:
    modules:
      - name: i915
        parameters:
          - force_probe=7d55 # also tried force_probe=!7d55

and

machine:
  install:
    extraKernelArgs:
      - i915.force_probe=7d55 # also tried i915.force_probe=!7d55

without any success, I get the same error in dmesg. Looks like I am stuck trying to build my own Talos image with the 6.8.x Kernel or waiting for the next update to the Kernel from here.

frezbo commented 1 hour ago

I am running into this exact issue on a 14th Gen Intel CPU...

k8s-0: kern:    info: [2024-11-16T04:09:27.992549659Z]: i915 0000:00:02.0: Your graphics device 7d55 is not properly supported by i915 in this
kernel version. To force driver probe anyway, use i915.force_probe=7d55
module parameter or CONFIG_DRM_I915_FORCE_PROBE=7d55 configuration option,
or (recommended) check for kernel updates.

As per the workaround, I've tried

machine:
  kernel:
    modules:
      - name: i915
        parameters:
          - force_probe=7d55 # also tried force_probe=!7d55

and

machine:
  install:
    extraKernelArgs:
      - i915.force_probe=7d55 # also tried i915.force_probe=!7d55

without any success. Looks like I am stuck trying to build my own Talos image with the 6.8.x Kernel or waiting for the next update to the Kernel from here.

probably need to add a udev blacklist for the driver and then explicitly load in machineconfig with the module parameter, otherwise if udevd already loaded the module, the machineconfig module parameter is a no-op

onedr0p commented 1 hour ago

@frezbo I am not familiar with blacklisting with udev, perhaps I could via the extraKernelArgs, e.g. module_blacklist=i915 or does it really need to be done via udev rules? Apologies if that is what you meant.

frezbo commented 1 hour ago

@frezbo I am not familiar with blacklisting with udev, perhaps I could via the extraKernelArgs, e.g. module_blacklist=i915 or does it really need to be done via udev rules? Apologies if that is what you meant.

I think the kernel arg would work, just need an upgrade, or you add a file like this: https://github.com/siderolabs/extensions/blob/main/nvidia-gpu/nvidia-modules/lts/pkg.yaml#L18 with content like this: https://github.com/siderolabs/extensions/blob/main/nvidia-gpu/nvidia-modules/lts/files/nvidia.conf

onedr0p commented 54 minutes ago

I just tried that but it doesn't look like it worked.

❯ talosctl -n k8s-0 dmesg | grep i915
k8s-0: kern:    info: [2024-11-16T05:31:02.463705401Z]: Command line: talos.platform=metal talos.config=none console=tty0 init_on_alloc=1 slab_nomerge pti=on consoleblank=0 nvme_core.io_timeout=4294967295 printk.devkmsg=on ima_template=ima-ng ima_appraise=fix ima_hash=sha512 apparmor=0 mitigations=off module_blacklist=i915,igc security=none
k8s-0: kern:  notice: [2024-11-16T05:31:02.602122401Z]: Kernel command line: talos.platform=metal talos.config=none console=tty0 init_on_alloc=1 slab_nomerge pti=on consoleblank=0 nvme_core.io_timeout=4294967295 printk.devkmsg=on ima_template=ima-ng ima_appraise=fix ima_hash=sha512 apparmor=0 mitigations=off module_blacklist=i915,igc security=none
k8s-0: user: warning: [2024-11-16T05:31:03.575922401Z]: [talos] [initramfs] enabling system extension i915-ucode 20241110
k8s-0: kern:     err: [2024-11-16T05:31:05.630471401Z]: Module i915 is blacklisted
k8s-0: kern:     err: [2024-11-16T05:31:06.792113401Z]: Module i915 is blacklisted
k8s-0: kern:     err: [2024-11-16T05:31:07.368046401Z]: Module i915 is blacklisted
k8s-0: kern:     err: [2024-11-16T05:31:07.615059401Z]: Module i915 is blacklisted
k8s-0: kern:     err: [2024-11-16T05:31:08.356218401Z]: Module i915 is blacklisted
k8s-0: kern:     err: [2024-11-16T05:31:09.122959401Z]: Module i915 is blacklisted
k8s-0: kern:     err: [2024-11-16T05:31:11.656259401Z]: Module i915 is blacklisted
k8s-0: user: warning: [2024-11-16T05:31:11.661716401Z]: [talos] controller failed {"component": "controller-runtime", "controller": "runtime.KernelModuleSpecController", "error": "error loading module \"i915\": load i915 failed: operation not permitted"}
k8s-0: kern:     err: [2024-11-16T05:31:14.230883401Z]: Module i915 is blacklisted
k8s-0: user: warning: [2024-11-16T05:31:14.234336401Z]: [talos] controller failed {"component": "controller-runtime", "controller": "runtime.KernelModuleSpecController", "error": "error loading module \"i915\": load i915 failed: operation not permitted"}
k8s-0: kern:     err: [2024-11-16T05:31:18.202190401Z]: Module i915 is blacklisted
k8s-0: user: warning: [2024-11-16T05:31:18.208650401Z]: [talos] controller failed {"component": "controller-runtime", "controller": "runtime.KernelModuleSpecController", "error": "error loading module \"i915\": load i915 failed: operation not permitted"}
k8s-0: kern:     err: [2024-11-16T05:31:21.433889401Z]: Module i915 is blacklisted
k8s-0: user: warning: [2024-11-16T05:31:21.437965401Z]: [talos] controller failed {"component": "controller-runtime", "controller": "runtime.KernelModuleSpecController", "error": "error loading module \"i915\": load i915 failed: operation not permitted"}
k8s-0: kern:     err: [2024-11-16T05:31:31.691439401Z]: Module i915 is blacklisted
k8s-0: user: warning: [2024-11-16T05:31:31.697305401Z]: [talos] controller failed {"component": "controller-runtime", "controller": "runtime.KernelModuleSpecController", "error": "error loading module \"i915\": load i915 failed: operation not permitted"}
k8s-0: kern:     err: [2024-11-16T05:31:49.785020401Z]: Module i915 is blacklisted
k8s-0: user: warning: [2024-11-16T05:31:49.790001401Z]: [talos] controller failed {"component": "controller-runtime", "controller": "runtime.KernelModuleSpecController", "error": "error loading module \"i915\": load i915 failed: operation not permitted"}
k8s-0: kern:     err: [2024-11-16T05:32:10.378622401Z]: Module i915 is blacklisted
k8s-0: user: warning: [2024-11-16T05:32:10.381130401Z]: [talos] controller failed {"component": "controller-runtime", "controller": "runtime.KernelModuleSpecController", "error": "error loading module \"i915\": load i915 failed: operation not permitted"}
k8s-0: kern:     err: [2024-11-16T05:32:51.594939401Z]: Module i915 is blacklisted
k8s-0: user: warning: [2024-11-16T05:32:51.597362401Z]: [talos] controller failed {"component": "controller-runtime", "controller": "runtime.KernelModuleSpecController", "error": "error loading module \"i915\": load i915 failed: operation not permitted"}