Open antheas opened 1 month ago
@m2Giles we've discussed merging kernel-cache
workflows into akmods
workflows or vice versa.
The resulting images would retain the same names, but akmods would be built immediately after kernel-cache succeeds.
This pre-existing idea does only address part of this suggestion, however, since currently, I don't believe we can trigger a kernel-cache or akmods for a specific kernel/fedora version, and that would still require a webhook to allow triggering from COPR builds, etc.
We just deployed the following kernel on bazzite unstable: https://github.com/hhd-dev/kernel-bazzite/actions/runs/11428122526
It builds on github and supports webhooks. I do not believe building kernels in copr for self consumption is the future for us. It takes way too long (5-7 hours) and it is unpredictable.
Whereas on github it builds on 2 hours and we also have the option of signing it.
You will also notice that the above is an action. So after it finishes I will have to manually have to set the release to latest (as doing so automatically would be risky if users have to consume the kernel), ping one of the ublue members and then they will have to trigger kcache and akmods accordingly.
Whereas with a webhook, it would autobuild the kmods for it based on tag name and the gate on bazzite could then be lifted when appropriate. With no manual intervention afterwards.
We just deployed the following kernel on bazzite unstable: https://github.com/hhd-dev/kernel-bazzite/actions/runs/11428122526
It builds on github and supports webhooks. I do not believe building kernels in copr for self consumption is the future for us. It takes way too long (5-7 hours) and it is unpredictable.
Whereas on github it builds on 2 hours and we also have the option of signing it.
I mentioned COPR in my comment as an example, but I agree with your point. Any appropriate caller could trigger a webhook after completing a kernel build.
You will also notice that the above is an action. So after it finishes I will have to manually have to set the release to latest (as doing so automatically would be risky if users have to consume the kernel), ping one of the ublue members and then they will have to trigger kcache and akmods accordingly.
Whereas with a webhook, it would autobuild the kmods for it and the gate on bazzite could then be lifted when appropriate. With no manual intervention afterwards.
Yep, I see the value and want to proceed with a webhook implementation.
The one "must have" to enable a webhook the way this issue is worded:
My question to @m2Giles is looking for agreement on merging kernel/akmods workflows before we implement webhook and specific kernel/fedora version builds, since if we don't do it, a caller (like the bazzite-kernel build) would need to call 2 webhooks and only call the second (akmods) if the first was successful. And we'd have to do workflow restructuring in both repos.
Ideally, kernel-cache would call a webhook after it succeeds to akmods for each kernel that succeeded.
Kernel cache has to be somewhat separate as we want to rebuild akmods for a specific kernel multiple times in case one of them updates.
Would definitely like to merge kernel-cache into akmods.
The two button press dance when they are inherently linked.
I also think we need to do something to prevent that intermediate time where kernel-cache is updated but akmods is not
@antheas
Kernel cache has to be somewhat separate as we want to rebuild akmods for a specific kernel multiple times in case one of them updates.
I believe your point here is to avoid unnecessary rebuilding of the kernel-cache if we only need to update the akmods.
The kernel-cache workflow is relatively fast so I think we could handle this a couple ways:
Like @m2Giles, I agree we want them more tightly coupled.
I also think we need to do something to prevent that intermediate time where kernel-cache is updated but akmods is not
For this concern, I think we could restructure workflows like:
Kernel cache has to be somewhat separate as we want to rebuild akmods for a specific kernel multiple times in case one of them updates.
I believe your point here is to avoid unnecessary rebuilding of the kernel-cache if we only need to update the akmods.
I mean having a dry run is one thing. The point here was being able to use an older kernel version that expired for reverts.
The kernel-cache workflow is relatively fast so I think we could handle this a couple ways:
- maybe it's fine to always rebuild kernel-cache if we want to rebuild akmods as it is "fast enough"?
- maybe we skip rebuilding kernel-cache if we see that for the kernel-version inputs we already have published a good, signed image into the registry?
Like @m2Giles, I agree we want them more tightly coupled.
I also think we need to do something to prevent that intermediate time where kernel-cache is updated but akmods is not
There is an inherent race condition here in which akmods might have partially updated. This is compounded due to the fact that there are multiple akmod images.
For this concern, I think we could restructure workflows like:
- build kernel-cache locally (or pull known good version if already exists) but do NOT push to ghcr
- build all akmods dependencies against the host-builder's local copy of kernel-cache image, but don't push images to GHCR
when all builds are complete: push all images to GHCR
- perhaps we add an artifact to track which local images need to be pushed in a "final step" of the workflow?
I think a big part in the solution will be if we want to be able to access older kernel builds. We have been bit due to copr on bazzite multiple times in the past and that is what created the kernel cache.
However, having a github repo that has a full kernel history and being able to build a fresh set of akmods from any previous kernel version is very powerful. In addition to being able to build multiple fresh kernels at the same time in just 2 hours.
If it is powerful enough to negate the need for a kernel cache, then we could just merge all akmods and the kernel that built them into a single output image and solve all concurrency concerns. Do we get a perf benefit from splitting them apart anyway? Seems most time is spent doing the same thing in all images.
If we proceed this way, I think the best way forward is a new repo that merges the previous two. Anything else would break builds for a week or so at least, as any change of this type would induce breaking changes.
Currently, the kernel (e.g.,
fsync
andbazzite
) has to be built, then afterwardskernel-cache
has to be triggered manually, and only after that builds, canakmods
be triggered to build. This increases kernel iteration time 2-3x (assuming someone is monitoring the builds).Add webhook support to the akmod repo, so that it can be called automatically for specific kernels and fedora versions if and only if the requesting kernel was updated.
The webhook will include the kernel version, mitigating the possibility of drift.
This should also reduce builder pressure by not scheduling dry runs for already built kernels..