Closed uhthomas closed 1 month ago
@smira this should be moved to the official extentions repo, as it requires an extention. Which I think is a reasonable fit as an official extention.
Yes, sure, but the issue itself is not actionable
Yes, sure, but the issue itself is not actionable
In what sense is it not-actionable?
I for example have no idea what needs to be done, if it was a PR or detailed steps, it would be easier.
Agreed... They are speaking in enabling kernel flags in the linked issues...
However... Ive checked the Talos kernel flags and this isnt disabled, afaik it should be enabled when unset.
I'll continue doing some more research and see if I can get a more specific pointer to work from. My arc card just arrived to test!
afaik it should be enabled when unset
I do not believe this is the case.
The second link appears to have the exact kernel flags that need to be set; I'm working on a build with those flags this morning (this is my first experience building a custom kernel for Talos so I'm feeling my way through the process here); I will report back.
Confirmed on the kernel config.
Additionally, I was able to see during kernel configuration that the flag CONFIG_INTEL_MEI
can also be set as a module, so unless I'm mistaken, we should indeed be able to achieve this with an extension similar to the NVIDIA GPU extension by packaging the modules.
This is likely beyond my level of expertise, but I'm at least going to try and get this kernel built and boot assets created so I can test this on my own as a working solution.
Thank you for investigating this. I've spent the whole day debugging my jellyfin installation and talos cluster after trying to use my new arc gpu, which ultimately led me here.
Happy to help.
Glad my foundational work was helpful, thanks for verifying it.
I wonder what the right thing to do here is? An Intel ME extension? Or should the i915 extension be updated to enable the Intel management engine?
I think a separate ME extension makes more sense:
Of course, this is just one person's take.
OK, potential slight snag, CONFIG_DRM_I915_PXP
is a yes/no so it does need to be enabled in the kernel proper. May need more changes than just an extension.
OK. This seems to simple to be true (kudos to the devs here if it really is this simple).
If I've got this correct, we should just need to:
pkgs
repo to add the ME modulesintel-me
) modules in the extensions
repoThen load the intel-me
and i915-ucode
extensions in the machine config, once the above are released.
Does that sound right?
OK. This seems to simple to be true (kudos to the devs here if it really is this simple).
If I've got this correct, we should just need to:
- update the kernel config in the
pkgs
repo to add the ME modules- create an extension containing the ME (e.g.
intel-me
) modules in theextensions
repoThen load the
intel-me
andi915-ucode
extensions in the machine config, once the above are released.Does that sound right?
Afaik config in the pkgs repo is always loaded, so if you set CONFIG_DRM_I915_PXP
there, for example its always loaded.
I'm not sure that plays nice with extensions or does it?
The kernel build from pkgs
doesn't fully go into Talos (kernel modules), only some modules are shipped by default, others via extensions.
You can use this for inspiration - https://github.com/siderolabs/extensions/tree/main/drivers/usb-modem
All right, I was able to munge through the docs to get the custom installer built with the custom kernel and modules and the i915 microcode. Unfortunately, I think something is still missing/awry, as Plex looked on the dashboard like it was going to start using the hardware encoder, but then the transcoder segfaulted:
go: kern: info: [2024-07-18T04:17:00.920239299Z]: Plex Transcoder[15773]: segfault at 0 ip 00007fee03bbac07 sp 00007ffe5ac663b0 error 4 in libigdrcl.so[7fee0392d000+3c4000] likely on CPU 8 (core 16, socket 0)
go: kern: info: [2024-07-18T04:17:01.101769299Z]: Code: 44 8b ab bc 07 00 00 41 c6 84 24 08 01 00 00 00 49 8d 44 24 08 49 89 04 24 f6 43 2c 01 74 50 48 8b 45 c8 48 8b b8 a0 00 00 00 <48> 8b 07 4c 8b 58 38 e8 dd 31 da ff 84 c0 75 35 83 bb e0 06 00 00
I need to dig a bit and make sure we aren't missing any necessary modules/drivers. I have pushed what I built last night to a public registry and it's at ghcr.io/e3b0c442/talos-installer:v1.7.5-mei
. This has the custom kernel/modules and the i915-ucode extension, if anyone else wants to kick the tires/try troubleshooting. The kernel config I used in my machineconfig is:
kernel:
modules:
- name: mei_hdcp
- name: mei-gsc
- name: mei-me
- name: mei-txe
- name: mei
- name: mei_pxp
- name: mei_wdt
I'm still waiting for the green light from my employer before I can submit code changes. Hopefully that will come before I figure out what's still failing on the Plex side. Alternatively, if anyone wants to pick up the baton in that regard I would have no complaints. :)
The segfault is a kernel bug, Plex is working on a workaround: https://github.com/tteck/Proxmox/discussions/3162
After making the suggested change to disable tone mapping hardware encoding is working as expected. As soon as I get clearance to submit PRs I will do so; otherwise it does appear that we just need to update the kernel config as in https://github.com/uhthomas/pkgs/commit/6a83361b8e4facfe551657e49bc71fa114d8be0f and then create an MEI extension, if somebody else wants to run with this.
//edit: another reference, the issue is a kernel bug in >=6.6.26 https://github.com/jellyfin/jellyfin/issues/11380
//edit2: looks to be resolved in kernel 6.6.31. I'm not sure if another v1.7 patch release is in plan (but also hoping we can get these changes in before v1.8)
//edit3: hmmm... the build I put out has kernel 6.6.33 so it must not actually be resolved. Will need to dig more.
I've updated my machine using ghcr.io/e3b0c442/talos-installer:v1.7.5-mei
, patched the talos config using the above mentioned modules and got it running. Jellyfin is also happily decoding using Intel QSV on an Intel Arc A380 now. Will keep testing tonight for stability.
Thanks so much!
//edit: stable transcoding for ~10hrs with vpp tone mapping and intel low power enabled
I have just gotten the necessary approval from my employer to submit contributions. I need to wait for the final paperwork to come through -- hopefully by the end of week -- then I'll be able to open PRs with these changes against the repos and hopefully 🤞 get this in for the 1.8 release, maintainers willing. :)
It looks like the Intel Management Engine Interface (MEI) is required to use Intel Arc. The i915 firmware does not work as the HuC firmware will fail to load.
See:
https://github.com/jellyfin/jellyfin/issues/9588
https://github.com/uhthomas/pkgs/commit/6a83361b8e4facfe551657e49bc71fa114d8be0f
https://gitlab.freedesktop.org/drm/intel/-/issues/7732