Open aheath1992 opened 9 months ago
What version of the role are you using? What version of ansible are you using? What is the platform/version of your control node? What is the platform/version of your managed node? @sergio-correia what other debugging information do we need?
What version of the role are you using - 1.71.1 What version of ansible are you using - 2.16 What is the platform/version of your control node - fedora 39 What is the platform/version of your managed node - RHEL 8.8
Hello. Here are some more info that may be helpful to debug this:
/dev/vda2
or do we have others?lsinitrd | grep clevis
can help hereetc/cmdline.d/01-default.conf
. Perhaps something like this could help to verify this: lsinitrd /boot/initramfs-$(uname -r).img etc/cmdline.d/01-default.conf
clevis-luks-askpass.path
unit is enabled: systemctl status clevis-luks-askpass.path
; it will be used if we are going to decrypt a disk in late boot phaseclevis luks list -d /dev/vda2
@richm: I wonder if it makes sense to have some "action"/"state" to collect some of these information from the managed hosts, to help troubleshooting such issues?
lsinitrd | grep clevis
clevis
clevis-pin-null
clevis-pin-sss
clevis-pin-tang
clevis-pin-tpm2
lrwxrwxrwx 1 root root 48 Jan 20 2023 etc/systemd/system/cryptsetup.target.wants/clevis-luks-askpass.path -> /usr/lib/systemd/system/clevis-luks-askpass.path
-rwxr-xr-x 1 root root 1679 Jan 20 2023 usr/bin/clevis
-rwxr-xr-x 1 root root 1654 Oct 28 2020 usr/bin/clevis-decrypt
-rwxr-xr-x 1 root root 1148 Jan 20 2023 usr/bin/clevis-decrypt-null
-rwxr-xr-x 1 root root 25296 Jan 20 2023 usr/bin/clevis-decrypt-sss
-rwxr-xr-x 1 root root 3560 Jan 20 2023 usr/bin/clevis-decrypt-tang
-rwxr-xr-x 1 root root 5121 Oct 28 2020 usr/bin/clevis-decrypt-tpm2
-rw-r--r-- 1 root root 32885 Jan 20 2023 usr/bin/clevis-luks-common-functions
-rwxr-xr-x 1 root root 2115 Oct 28 2020 usr/bin/clevis-luks-list
-rwxr-xr-x 1 root root 2466 Jan 20 2023 usr/libexec/clevis-luks-askpass
-rw-r--r-- 1 root root 302 Oct 28 2020 usr/lib/systemd/system/clevis-luks-askpass.path
-rw-r--r-- 1 root root 190 Jan 20 2023 usr/lib/systemd/system/clevis-luks-askpass.service
lsinitrd /boot/initramfs-$(uname -r).img etc/cmdline.d/01-default.conf
rd.neednet=1
systemctl status clevis-luks-askpass.path
● clevis-luks-askpass.path - Forward Password Requests to Clevis Directory Watch
Loaded: loaded (/usr/lib/systemd/system/clevis-luks-askpass.path; enabled; vendor preset: enabled)
Active: active (waiting) since Fri 2024-02-02 13:54:24 UTC; 24s ago
Docs: man:clevis-luks-unlockers(7)
clevis luks list -d /dev/vda2
1: sss '{"t":1,"pins":{"tang":[{"url":"http://tang1"},{"url":"http://tang2"}]}}'
At a first glance, it looks OK -- could you also check `journalctl , to see if any useful information shows up, please? (I forgot to mention beforehand, but feel free to redact any IP addresses, if required)
journalctl -xf -u clevis-luks-askpass.service
Feb 02 15:21:20 clevis-test.ansi-001.prod.iad2.dc.redhat.com clevis-luks-askpass[11941]: Error communicating with the server http://tang1 Feb 02 15:21:20 clevis-test.ansi-001.prod.iad2.dc.redhat.com clevis-luks-askpass[11942]: Error communicating with the server http://tang2
telnet tang1 80 Trying tang1... Connected to tang1. Escape character is '^]'.
telnet tang2 80 Trying tang2... Connected to tang2. Escape character is '^]'.
@sergio-correia any idea?
I've been seeing this as well. I have found that adding the _netdev
option to the relevant fstab entry allows the unlocking to proceed (tested on Rocky 8 and 9 clients, both early and late boot, and Debian 11 and 12 clients, late boot only). I have added an awk script task into my playbook after the role runs to add this option.
- name: Update fstab options
ansible.builtin.shell: |
name="$(awk '$2 == "{{ item.device }}" { print $1 }' /etc/crypttab | head -n 1)"
awk -v mapper_path="/dev/mapper/$name" '{
if ($1 == mapper_path && index($4, "_netdev") == 0) {
$4 = $4 ",_netdev"
}
print
}' /etc/fstab > /tmp/fstab
diff -q /tmp/fstab /etc/fstab || echo changed
mv /tmp/fstab /etc/fstab
loop: '{{ nbde_client_bindings }}'
register: fstab
changed_when: '"changed" in fstab.stdout'
I believe this behavior is tied to systemd's ordering of mount units, that is, it orders fstab entries with _netdev
after network.online
which is necessary for clevis to work. (ref)
I've been seeing this as well. I have found that adding the
_netdev
option to the relevant fstab entry allows the unlocking to proceed (tested on Rocky 8 and 9 clients, both early and late boot, and Debian 11 and 12 clients, late boot only). I have added an awk script task into my playbook after the role runs to add this option.- name: Update fstab options ansible.builtin.shell: | name="$(awk '$2 == "{{ item.device }}" { print $1 }' /etc/crypttab | head -n 1)" awk -v mapper_path="/dev/mapper/$name" '{ if ($1 == mapper_path && index($4, "_netdev") == 0) { $4 = $4 ",_netdev" } print }' /etc/fstab > /tmp/fstab diff -q /tmp/fstab /etc/fstab || echo changed mv /tmp/fstab /etc/fstab loop: '{{ nbde_client_bindings }}' register: fstab changed_when: '"changed" in fstab.stdout'
I believe this behavior is tied to systemd's ordering of mount units, that is, it orders fstab entries with
_netdev
afternetwork.online
which is necessary for clevis to work. (ref)
Yeah, this is likely in the right direction.
We may need to have _netdev
in crypttab, to mark the device as requiring network, and to prevent a dependency loop, we also need to add _netdev
to fstab as well, if the device is specified there for a mount point. Additionally, we may also have to enable the remote-cryptsetup.target
unit.
I have some more information that should probably be considered here from doing some testing with this role. I did have to add the _netdev
option in both /etc/fstab and /etc/crypttab for automatic unlock. This works fine, however, on SystemD versions < 245, the crypttab generator creates a weird ordering issue with the dev-mapper-{name}.device unit that will hang shutdown indefinitely. This can be fixed by adding the x-systemd.requires=systemd-cryptsetup@{name}.service
option to the appropriate device in /etc/fstab as well. I have an Ansible-native solution in the playbook I used to deploy this which I could turn into a PR, but it requires several new options per device in nbde_client_bindings
so that it can create the appropriate crypttab and fstab entries.
EDIT: the SystemD issue mentioned https://github.com/systemd/systemd/issues/8472
New system is unable to unlock after running the nbde_client role, after running the role get an all good from Ansible but upon reboot the system stops at the Luks encryption screen.