Open hershpa opened 1 year ago
@mregmi
@qbarrand, any comments/feedback on this appreciated. Thanks!
To add some more info, we are trying to identify if there are any KABI changes between OCP Z releases and detect that. And if any change is detected, we rebuild the driver image and if not we just reuse the previous driver image. For this purpose we are planning to use this project (https://github.com/skozina/kabi-dw) which can detect kABI changes between two kernel versions.
Summary:
The Kernel Application Binary Interface (kABI) is a set of in-kernel symbols used by drivers and other kernel modules. Currently, the general idea is to rebuild and test the Intel GPU driver container image whenever the kernel version associated with a particular OCP z stream changes. This is the safest approach. Unfortunately, it requires continuous rebuild and test efforts that can be facilitated by automation but still carries a non-zero cost. It may be possible to reduce rebuild efforts based on the theory that no rebuild is required if the kernel ABI does not change across all z streams in a particular OCP minor version X.Y.
Potential Idea:
Assuming that the driver is using the list of stable symbols for which Red Hat guarantees ABI compatibility, consider the following.
Based on RHEL KB,
Based on this other KB, an OCP minor version always uses a certain minor RHEL version.
RHCOS/OCP Versions | RHEL Versions -- | -- 4.11 | RHEL 8.6 4.12 | RHEL 8.6 4.13 | RHEL 9.2Tentative Conclusion:
It would be reasonable to conclude that for OCP 4.12 based on RHEL8.6, only 1 driver container is required to support all OCP 4.12.z versions as long as the kernel ABI stays the same. Similarly, all z streams for OCP 4.13 based on RHEL9.2 would require a single driver container image.
Goal:
The goal is to understand the pros, cons and the potential risk of this approach. Theoretically, it is possible to use the same driver container with different kernel version as long as the kernel ABI remains stable. It is important to note that
In general, even if rebuilds are avoided, it is reasonable to retest the existing driver container when the kernel version changes using automation to ensure compatibility and functionality.