Closed TheMysteriousX closed 4 months ago
@TheMysteriousX can you give me an example of specific use case for this?
@sergio-correia does this make sense to you?
@TheMysteriousX can you give me an example of specific use case for this?
@richm: I will let @TheMysteriousX elaborate on the use cases, but basically unlocking may happen in 2 different moments: first during early boot, in which case we need to amend the initrd to e.g. setup networking, when required, and then in late boot, after the system has switched from the initrd to the actual root filesystem (switch-root
phase). At this point, the system uses its "regular" configuration.
For some devices, such as the root device (/
) or the swap
for instance, unlocking needs to happen in early boot, so that the system can continue to boot; for some other devices, e.g. an encrypted /opt
, its unlocking happen in late boot. If one only has devices that unlock in late-boot, there is no need to have changes to the initrd, which in this case seems to be causing issues with the regular system configuration after the switch-root
phase.
@sergio-correia does this make sense to you?
Yep, it does.
If you rebase on top of the latest main
branch, that will fix the ansible-lint issue.
Thanks for the review, all the proposed changes look good - I'll implement them and rebase as suggested.
can you give me an example of specific use case for this?
Sure - we keep the base OS volume unencrypted and attach additional encrypted data volumes separately to VM's because:
So to keep things simple, we modified the nbde role to not configure the initrd and flush service - the encrypted volume gets unlocked and mounted after NetworkManager has started instead of before. Systemd supports some additional directives for making sure processes don't get started before a required volume is online which we set with linux-system-roles/storage
.
This is also advantageous because the role does not support static addressing as additional parameters are required for that. Any host that contacts the tang hosts via e.g. WiFi, Cellular, bonded interfaces, dynamic routing, IPsec, Wireguard, PPPoE is similarly not supported - I'm not aware of any simple way to support these, other than not decrypting them with the initrd.
Ideally when using the initrd to decrypt volumes, you also need to run an sshd so that the host can be manually decrypted remotely if the tang hosts were to fail - but I don't think this is supported on RHEL at all.
[citest]
[citest]
[citest]
@sergio-correia how can we have automated tests for this new feature? is it possible?
@sergio-correia how can we have automated tests for this new feature? is it possible?
Yeah, it is possible, but not very simple. Basically we can have a VM provisioned having an encrypted /data
device, then we use the role to set up clevis and then reboot the machine to verify whether it unlocked the device successfully.
Long ago, when there was travis-ci
(and it supported nested virt), we had a test on the clevis upstream repository that would do something along those lines for each PR: provision a VM with a kickstart, set up clevis, reboot and verify the expected outcome. When we moved to github-actions, I remember it did not support nested virt and we ended up removing that test.
Enhancement: Allow the initrd and network manager/dracut flush module mechanisms to be disabled.
Reason: We have volumes that are unlocked by
clevis-luks-askpass
late in the boot process after NetworkManager has put the system on the network, so no changes to the initrd are needed.The affected systems we have this arrangment on have complicated network setups (bonds, macsec, static addressing, IPv6), so the role actually breaks the boot process for them, as it does not account for anything except single NIC + DHCP + IPv4.
Result: User can disable initrd configuration if required, supporting advanced network configuration to be used or decryption to occur late in the boot process.
Issue Tracker Tickets (Jira or BZ if any):