Closed yevgeny-shnaidman closed 1 year ago
I think it is worth adding a MachineConfig
example for better understanding of what is actually being applied to the cluster/MCO-static-pod.
Would it be possible to leverage existing KMM functionality for driver container management directly in Day1? Can KMM become a cluster operator on OCP? The general idea is to let MCO embrace its strengths (machine configuration etc) and KMM to embrace its strengths and avoid overlap between operators.
We are now using the MCO to configure the Node system for KMM to load the modules, for example, we have to make use of MCO to prevent some in tree driver to be loaded and add some kernel booting parameters which is necessary for load the Module for KMM.
And in order to avoid rebooting the system We do prefer to run this configuration in day1 instead of day2. So we can make KMM as simple as possible.
But so as to dirver container image management as well as the module management. we still think it is good for KMM to handle it. of course we can let MCO to handle it, but that might introudce a lot of complication if KMMO and MCO handle the same thing from differnt Operator in Day1 or Day2.
And We all know KMM already can handle driver container image and module very well.
We are now using the MCO to configure the Node system for KMM to load the modules, for example, we have to make use of MCO to prevent some in tree driver to be loaded and add some kernel booting parameters which is necessary for load the Module for KMM. And in order to avoid rebooting the system We do prefer to run this configuration in day1 instead of day2. So we can make KMM as simple as possible. But so as to dirver container image management as well as the module management. we still think it is good for KMM to handle it. of course we can let MCO to handle it, but that might introudce a lot of complication if KMMO and MCO handle the same thing from differnt Operator in Day1 or Day2. And We all know KMM already can handle driver container image and module very well.
@uMartinXu This issue is regarding use-cases where the Day2 KMM is not applicable. It does not replace Day2 KMM, but expand the general KMM options to handle kernel modules that need to be loaded very soon after the boot, way before KMM Operator starts running. Customer can choose which option to use and what is more compatible with hist use-case
Would it be possible to leverage existing KMM functionality for driver container management directly in Day1? Can KMM become a cluster operator on OCP? The general idea is to let MCO embrace its strengths (machine configuration etc) and KMM to embrace its strengths and avoid overlap between operators.
@hershpa currently there are no plans to make KMM core operator. In addition, even if KMM becomes core operator, we will still need day1 functionality. Even as a core operator, KMM starts running only after the full boot process of a node and OS has been completed. So, if we need kernel modules to be loaded prior to that, we need to use functionality described in this issue
/assign @yevgeny-shnaidman
@yevgeny-shnaidman I believe we can close this?
yes, closing
Issue Summary
Currently KMM supports only Day2 operations: loading/replacing/upgrading kernel modules only after full installation of the OCP cluster. Allowing some kind of Day1 support ( loading kernel module prior to full cluster installation) will increase the usability of KMM
Proposed solution
The solution is rendered on the level of root FS, and not on the level of initrmfs, using the MCO, ignition configuration and the same driver containers as used by KMM
Example MachineConfig
```yaml apiVersion: machineconfiguration.openshift.io/v kind: MachineConfig metadata: labels: machineconfiguration.openshift.io/role: sfc name: replace-sfc spec: config: ignition: version: 3.2.0 systemd: units: - contents: | [Unit] Description=Replace in-tree sfc driver with oot sfc driver Before=network-pre.target Wants=network-pre.target DefaultDependencies=no [Service] User=root Type=oneshot TimeoutSec=10 ExecStartPre=ls /usr/local/bin ExecStart=/usr/local/bin/replace-sfc-driver.sh PrivateTmp=yes RemainAfterExit=no TimeoutSec=60 [Install] WantedBy=multi-user.target enabled: true name: "replace-sfc.service" - contents: | [Unit] Description=Pull oot sfc driver container After=network-online.target Wants=network-online.target DefaultDependencies=no [Service] User=root Type=oneshot ExecStart=/usr/local/bin/pull-sfc-driver.sh PrivateTmp=yes RemainAfterExit=no TimeoutSec=900 [Install] WantedBy=multi-user.target enabled: true name: "pull-sfc-image.service" - enabled: false mask: true name: crio-wipe.service storage: files: - path: "/usr/local/bin/replace-sfc-driver.sh" mode: 511 overwrite: true user: name: "root" contents: source: "data:text/plain;base64,IyEvYmluL2Jhc2gKSU1BR0U9InF1YXkuaW8veXNobmFpZG0vY2l0aS1kcml2ZXJzIgpUQUc9IjQuMTAuMjUiCmVjaG8gImJlZm9yZSBjaGVja2luZyBwb2RtYW4gaW1hZ2VzIgppZiBwb2RtYW4gaW1hZ2VzIHwgZ3JlcCAkSU1BR0UgfCBncmVwIC1xICRUQUc7IHRoZW4KICAgIGVjaG8gIkltYWdlICRJTUFHRTokVEFHIGZvdW5kIGluIHRoZSBsb2NhbCByZWdpc3RyeSwgcmVtb3ZpbmcgaW4tdHJlZSBzZmMiCiAgICBtb2Rwcm9iZSAtciBzZmMKICAgIGVjaG8gIlJ1bm5pbmcgY29udGFpbmVyIGltYWdlIHRvIGluc2VydCB0aGUgb290IHNmYyIKICAgIHBvZG1hbiBydW4gLS1wcml2aWxlZ2VkIC0tZW50cnlwb2ludCBtb2Rwcm9iZSAkSU1BR0U6JFRBRyAtZCAvb3B0IHNmYwogICAgZWNobyAiT09UIHNmYyBpcyBpbnNlcnRlZCIKZWxzZQogICBlY2hvICJJbWFnZSAkSU1BR0U6JFRBRyBpcyBub3QgcHJlc2VudCBpbiBsb2NhbCByZWdpc3RyeSwgd2lsbCB0cnkgYWZ0ZXIgcmVib290IgpmaQo=" - path: "/usr/local/bin/pull-sfc-driver.sh" mode: 493 overwrite: true user: name: "root" contents: source: "data:text/plain;base64,IyEvYmluL2Jhc2gKSU1BR0U9InF1YXkuaW8veXNobmFpZG0vY2l0aS1kcml2ZXJzIgpUQUc9IjQuMTAuMjUiCmlmIHBvZG1hbiBpbWFnZSBsaXN0IHwgZ3JlcCAkSU1BR0UgfCBncmVwIC1xICRUQUc7IHRoZW4KICAgIGVjaG8gIkltYWdlICRJTUFHRSBmb3VuZCBpbiB0aGUgbG9jYWwgcmVnaXN0cnkuTm90aGluZyB0byBkbyIKZWxzZQogICAgZWNobyAiSW1hZ2UgJElNQUdFIG5vdCBmb3VuZCBpbiB0aGUgbG9jYWwgcmVnaXN0cnksIHB1bGxpbmciCiAgICBwb2RtYW4gcHVsbCAkSU1BR0U6JFRBRwogICAgZWNobyAiSW1hZ2UgJElNQUdFOiRUQUcgaGFzIGJlZW4gc3VjY2Vzc2Z1bGx5IHB1bGxlZCwgcmVib290aW5nLi4iCiAgICByZWJvb3QKZmkK" ```Replacing in-tree kernel module
Install Service will always run
modprobe -r
command prior to installing the kernel driver. This will either remove the in-tree kernel module, or will do nothing (modprobe -r
does not return error in case kernel module is not present). To be safe the command will be run from the entry point of the DriverContainer ImageIntegration with Day2 KMM
Once the cluster is installed, KMMO can be deployed with a Module CR that targets the same kernel module ( with the same DriverContainer image, or a different one). This will won't unload the Day1 installed kernel module, and will allow customer to support kernel module upgrade (without node restart if possible) and cluster upgrade predictions via Preflight CRD
MCO/Day1 support models
We can provide 2 support models: off-cluster and in-cluster
off-cluster
In addition to operator image, KMM will provide an executable utility that can be run on any x86_64/arm server, will receive as input the DriverContainer image and the kernel module location, and will produce the MCO yaml that can be applied as manifests during cluster installation
in-cluster
KMMO will support an additional CRD, that will receive the inputs defined above and will produce the same MCOs. KMMO might event apply them itself, although this option is less viable. The executable from the "off-cluster" solution will be re-used in the "off-cluster" solution
Pros/Cons
Pros
Cons