Closed jmpolom closed 2 years ago
It looks like the module is actually included, this is an LTS kernel that has most general drivers included. We can look at adding the module directly compiled into the kernel. 5.10.x has just been superseded buy 5.15.x as the LTS branch.
It looks like the module is actually included, this is an LTS kernel that has most general drivers included.
I did notice that it was selected as a module in the current 5.10 config. Are all modules that are built as modules in the kernel config included in the initrd for hook? Or is it like building an initd for Fedora or Debian where modules that need to be included in the initrd need to be specified explicitly?
I haven't decompressed the initrd yet to have a look however I'm now thinking it would be worth looking.
My gut feeling is that there were changes upstream though after 5.10.57 or that had not yet been backported to .57 which are causing the issue here. I read in a number of places that certain variants of the I225-LM had different firmware that presented issues for drivers detecting the devices. I think updating kernel versions is the prudent thing to do here especially in light of the fact that 5.15 is the new LTS.
Are there plans to jump onto 5.15 for hook? Is there any particular complexity to bumping kernel versions in hook or is it mostly a matter of creating a config compatible with the new version?
AFAIK, no one has made an explicit plan to upgrade to 5.15, but in general we want to keep up to date with things. @thebsdbox - any thoughts on it?
I do note that 5.15 has an EOL of Oct 2023 - versus 5.10 having an EOL of Dec 2026.
EOL of Oct 2023 seems like it should be sufficient? I feel like kernel versions should be getting upgraded on a period of something like 1-2 years in order to ensure hook is compatible with the latest hardware.
If there isn't a plan to update to 5.15, were there any plans to bump point releases on the 5.10 kernel line? Right now 5.10.79 is the latest in that series which is a fair bit more up to date than where things are now.
@tstromberg is there anything we can do to move this along? I don't think we've tested bumping the kernel version to the latest 5.10.83 kernel but if that would be helpful please let me know.
@thebsdbox @tstromberg Is there anything that can be done here to proceed?
I can confirm that updating the kernel to 5.10.85 enables operation of the Intel I225-LM NIC and tinkerbell workflows can succeed using hook. PR sent.
In the interest of getting a fix out quickly, the hook kernel should be updated to the latest 5.10.x kernel to close this issue. Long term, we need to figure out a path forward to move to 5.15 which is the latest LTS kernel and also discuss how to keep up with upstream kernel releases.
This issue is not yet fixed. I still get this issue with 5.10.85.
The tinkerbell project needs to do a better job keeping the kernels in hook up to date. The kernel I submitted a PR for was recent 2 years ago but is now quite outdated even within the 5.10 LTS kernel series. We are no longer looking to use the tinkerbell tooling in my organization so I don't have a need to work on this any longer.
There's enough info in the associated PR to figure out how to update the kernel to the latest point release for 5.10. You might consider repeating the exercise I undertook here as a quick fix. Long term hook needs to move on to a more recent LTS kernel. There have since been 2 new LTS kernel lines.
Expected Behaviour
Hook detects and loads the
igc
module on systems with Intel I225-LM NICs present.Current Behaviour
Hook boots but seemingly fails to detect Intel I225-LM devices and the
igc
module does not get loaded. This results in no network connectivity if a system is connected to the provisioning network via an interface with this chipset.Possible Solution
Update the kernel. Hook uses 5.10.57 which is many releases behind upstream 5.10 LTS (currently at 5.10.78) and likely the cause here. Ideally, hook should move to using mainline releases for better compatibility with new hardware as it makes it into new kernel releases.
Steps to Reproduce (for bugs)
Context
We are intending to use tinkerbell to deploy many client devices that unfortunately only have a single I225-LM NIC on them. We can sidestep this issue by not using hook and doing automated OS installs however that is going to be a slower option than using hook to deploy disk images. Ideally, hook should function on modern hardware.
To be sure, both Fedora 34 and Debian 11 were also tested on the system with an I225-LM NIC and both were able to detect the NIC and loaded the
igc
module. Whatever the issue is that affects hook supporting this devices seems to have been fixed in later LTS kernels and mainline.Questions
Is there a specific reason hook has been held back to such a dated LTS kernel release? This definitely is going to hamper support of new hardware.
I noticed some patches so I could see those requiring some work to validate against or port to a newer kernel. I could see time being a constraint here. Maintaining a kernel build is certainly not a zero time commitment.
Your Environment
Operating System and version (e.g. Linux, Windows, MacOS): Fedora Silverblue and CoreOS
How are you running Tinkerbell? Using Vagrant & VirtualBox, Vagrant & Libvirt, on Packet using Terraform, or give details: podman containers
cc: @jkl92 @storrgie