openwall-com-au / BootUnlock

A helper script that unlocks macOS'es encrypted APFS volumes before login
GNU General Public License v3.0
49 stars 7 forks source link

Failure to unlock volumes on attached Apple_RAID #12

Open imcnish opened 3 years ago

imcnish commented 3 years ago

Doesn't seem to work until after a user is logged in. Would love any suggestions/thoughts/help.

No problems:

Process to recreate the issue (after install and configuring BootUnlock):

Basic info:

BootUnlock.log (cleared just before reboot and the Process to recreate the issue described above):

ian834@machine.local:/var/log [48] % cat BootUnlock.log
=== Wed Oct 28 11:27:30 PDT 2020 ===
=== Wed Oct 28 11:28:16 PDT 2020 ===
Trying to unlock volume "Users" with UUID 73E6E581-C9A1-4D84-9B99-B831CF1E9EC3 ...
Unlocking any cryptographic user on APFS Volume disk7s3
Unlocked and mounted APFS Volume
=== Wed Oct 28 11:28:24 PDT 2020 ===
=== Wed Oct 28 11:30:13 PDT 2020 ===
Trying to unlock volume "Users" with UUID 73E6E581-C9A1-4D84-9B99-B831CF1E9EC3 ...
Unlocking any cryptographic user on APFS Volume disk7s3
Unlocked and mounted APFS Volume
=== Wed Oct 28 11:30:25 PDT 2020 ===
=== Wed Oct 28 11:30:36 PDT 2020 ===
=== Wed Oct 28 11:41:35 PDT 2020 ===

Messages from syslog:

ian834@machine.local:/var/log [49] % grep BootUnlock /var/log/system.log
/var/log/system.log:Oct 28 11:27:31 machine com.apple.xpc.launchd[1] (au.com.openwall.BootUnlock[5653]): Service exited with abnormal code: 1
/var/log/system.log:Oct 28 11:28:20 machine com.apple.xpc.launchd[1] (au.com.openwall.BootUnlock): Service only ran for 6 seconds. Pushing respawn out by 4 seconds.
/var/log/system.log:Oct 28 11:28:24 machine com.apple.xpc.launchd[1] (au.com.openwall.BootUnlock[370]): Service exited with abnormal code: 1
/var/log/system.log:Oct 28 11:30:27 machine com.apple.xpc.launchd[1] (au.com.openwall.BootUnlock[1607]): Service exited with abnormal code: 1
/var/log/system.log:Oct 28 11:30:35 machine com.apple.xpc.launchd[1] (au.com.openwall.BootUnlock): Service only ran for 9 seconds. Pushing respawn out by 1 seconds.
/var/log/system.log:Oct 28 11:30:36 machine com.apple.xpc.launchd[1] (au.com.openwall.BootUnlock[1815]): Service exited with abnormal code: 1
/var/log/system.log:Oct 28 11:41:36 machine com.apple.xpc.launchd[1] (au.com.openwall.BootUnlock[2349]): Service exited with abnormal code: 1

Manual execution of 'helper.sh' works fine:

ian834@machine.local:/var/log [50] % sudo /Library/PrivilegedHelperTools/au.com.openwall.BootUnlock/helper.sh
=== Wed Oct 28 11:53:59 PDT 2020 ===
Trying to unlock volume "Users" with UUID 73E6E581-C9A1-4D84-9B99-B831CF1E9EC3 ...
Unlocking any cryptographic user on APFS Volume disk7s3
Unlocked and mounted APFS Volume
galaxy4public commented 3 years ago

Interesting, I am also currently running 10.15.7 with no issues. I will try to come up with some steps to troubleshoot why the script is failing when is launching through launchd. It seems that you have quite a layout of drives and partitions. Thanks for the report.

imcnish commented 3 years ago

It probably shouldn't matter, but (just FYI) I installed BootUnlock on a fresh macOS install. I.e. this wasn't an upgrade from another version of the OS, and I didn't "migrate" any user data to this system. Not sure if some new setting could be affecting me that would have been grandfathered-in had I upgraded the OS or migrated user data from an older version of the OS.

The machine is really just two disks: (1) the internal factory standard boot disk and (2) the external SSD. My goal is simply to move the user's home dir off the boot dir and on to the external SSD for performance and space, however I don't want to sacrifice full disk encryption -- same situation as all BootUnlock users I suspect.

The external SSD is a software RAID 1 (a la Apple RAID Manager), and that RAID has two volumes, so the layout is MacOS (with it's five default volumes) and my external disk with two volumes (the encrypted volume I'm working on here ("Users") and another unencrypted volume ("Pictures")).

If I run 'diskutil apfs list -plist | xsltproc --novalid "/Library/PrivilegedHelperTools/au.com.openwall.BootUnlock/diskutil.xsl" -' (taken from 'helper.sh') all the volumes show up correctly, and the one encrypted volume with my user home directories ("Users") shows up matching the regex ':true:true$' (when unmounted and locked).

ian834@machine.local:~ [49] % diskutil apfs list -plist | xsltproc --novalid "/Library/PrivilegedHelperTools/au.com.openwall.BootUnlock/diskutil.xsl" -
Macintosh HD - Data:CDE85451-CB67-496F-A766-731255A5077A:disk6s1:true:
Macintosh HD:B98F79AD-B88C-485D-B806-8B8B6ED5BBE9:disk6s2:true:
Preboot:212838DF-39A8-4FC9-B770-BD8D340CF5EE:disk6s3::
Recovery:4510B63C-CF80-4E24-BBF0-40E2C5844412:disk6s4::
VM:E4994AED-F877-447A-920A-F40CDAB45A66:disk6s5:true:
Pictures:1762329A-2B55-4D4B-A2C8-761975AC048F:disk7s2::
Users:73E6E581-C9A1-4D84-9B99-B831CF1E9EC3:disk7s3:true:true

Just for fun, I tried rebooting my machine ("machine"), providing an admin password during pre-boot, then using ssh to login remotely without logging in locally. This was coming from another machine ("brick"), going to the machine in question ("machine"). There was no problem logging in, however it's clear that even after the machine was fully booted (and BootUnlock should have run) the encrypted external volume ("Users") did not mount. Ssh login works w/o a home dir however local X login does not work without a home dir.

brick.local:ian834[71] ~ % ssh test1@10.1.1.157
Password:
Last login: Wed Oct 28 11:44:09 2020 from 127.0.0.1
Could not chdir to home directory /Volumes/Users/test1: No such file or directory
test1@machine / % pwd
/

I tried uninstalling BootUnlock, reboot, and reinstall BootUnlock to see if there were any changes... but it was still broken. I didn't see an uninstaller so to uninstall I did:

After reboot I rebuilt the BootUnlock package via "make" and installed it again. Again, no errors during pkg install and the System keychain entry was added correctly.

After installing and rebooting to test:

ian834@machine.local:~ [45] % cat /var/log/BootUnlock.log
===[ update.sh: Thu Oct 29 08:07:15 PDT 2020 ]===
=== Thu Oct 29 08:07:15 PDT 2020 ===
Trying to unlock volume "Users" with UUID 73E6E581-C9A1-4D84-9B99-B831CF1E9EC3 ...
BootUnlock: SecKeychainSearchCopyNext: The specified item could not be found in the keychain.
NOTICE: could not find the secret on the System keychain, skipping the volume.
Verifying the specific cryptographic user 73E6E581-C9A1-4D84-9B99-B831CF1E9EC3 on APFS Volume disk7s3
Passphrase valid
Adding password for volume "Users" with UUID 73E6E581-C9A1-4D84-9B99-B831CF1E9EC3 to the System keychain...
=== Thu Oct 29 08:09:18 PDT 2020 ===
Trying to unlock volume "Users" with UUID 73E6E581-C9A1-4D84-9B99-B831CF1E9EC3 ...

diskutil: interrupted
Unlocking any cryptographic user on APFS Volume disk7s3
=== Thu Oct 29 08:10:06 PDT 2020 ===
Trying to unlock volume "Users" with UUID 73E6E581-C9A1-4D84-9B99-B831CF1E9EC3 ...
Unlocking any cryptographic user on APFS Volume disk7s3
Unlocked and mounted APFS Volume
=== Thu Oct 29 08:11:28 PDT 2020 ===
Trying to unlock volume "Users" with UUID 73E6E581-C9A1-4D84-9B99-B831CF1E9EC3 ...
Unlocking any cryptographic user on APFS Volume disk7s3
Unlocked and mounted APFS Volume
=== Thu Oct 29 08:11:41 PDT 2020 ===
=== Thu Oct 29 08:11:52 PDT 2020 ===
=== Thu Oct 29 08:23:26 PDT 2020 ===

"update.sh" runs. There is no System keychain entry for the volume, so a new entry is validated and added. Interesting error "diskutil: interrupted". BootUnlock successfully mounts the encrypted external volume twice. Not sure why twice, but at least it's working. Unfortunately the unlock occurred after login, so the user whose home dir is located on that volume wasn't able to login locally.

One other interesting thing I noticed -- after rebooting and trying to login as user "test1" (home dir on the encrypted external drive), the login fails, I login as user "test2" (home dir on the encrypted boot drive) I am getting a authentication pop-up (see attached screenshot). This pop-up appears connected to the BootUnlock execution because it's asking for permission to access the System keychain. Screen Shot 2020-10-28 at 4 59 46 PM

System keychain entry Name=Users --> Access Control: Screen Shot 2020-10-29 at 8 51 37 AM

galaxy4public commented 3 years ago

The service is hooked in using the "Mount" event (see https://github.com/openwall-com-au/BootUnlock/blob/master/scripts/postinstall#L35), so it is triggered every time a volume is detected and is attempted to be mounted. I think I will need to rework this section and actually react on the particular volume being requested. Thank you for your extensive research into this - it helps.

Also, on re-installing BootUnlock -- as you may have seen the helper is a really simple script and the problem we are currently facing in your case is that the call to the helper terminates abruptly when it was supposed to decrypt & mount that external volume. I suspect that there is something like a race with the mount event when it comes to an external RAID1 (never tested my script with such a configuration). I will try to compose a similar setup and will test more.

galaxy4public commented 3 years ago

@ian834, one test I would suggest to try is:

  1. disconnect your external drive
  2. boot up to the login screen (but do not login)
  3. attach the external drive (this should trigger the mount event separately to everything else)
  4. try to login as the user whose home directory is on the external drive

If this works it would mean that we narrowed the issue down to the race condition at the bootup procedure.

Also, it would help if you could provide the output of diskutil list (so I could better replicate your specific setup).

imcnish commented 3 years ago

After tinkering a bit it seems like the 'launchd' error captured by syslog com.apple.xpc.launchd[1] (au.com.openwall.BootUnlock[1807]): Service exited with abnormal code: 1 is caused by something going wrong in 'diskutil'.

There seems to be a correlation between the Service exited with abnormal code errors and 'helper.sh' errors diskutil: interrupted, which are captured in /var/log/BootUnlock.log.

When whatever goes wrong, the test if ! printf '%s' "$PASSPHRASE" | diskutil apfs unlock "$DEVICE" -stdinpassphrase ; then in 'helper.sh' fails, and then 'bash' generates a non-zero exit because "set -e -o pipefail" is set. This results in the Service exited with abnormal code error to syslog.

If I comment out the "set -eu -o pipefail" line in 'helper.sh' then the Service exited with abnormal code errors disappear.

This doesn't fix anything, just removes a less than helpful error.

When calls to 'diskutil' fail in this way, they appear to partially succeed because 'diskutil' reports Unlocking any cryptographic user on APFS Volume disk7s3 however 'diskutil' never makes it to mounting the volume.

Most of the calls to 'diskutil' succeed and the volume is both unlocked and mounted, so it feels like chasing errors related to whatever causes the occasional diskutil: interrupted condition is a red herring.

It seems like the simple fix for 'diskutil' failing to mount is making a second attempt, perhaps before attempting to decode the passwd as hex, however this approach would also necessitate adjusting 'bash' to not consider this a fatal error (because it's happening in a pipe... i.e. remove or temporarily disable "set -e -o pipefail").

It's not clear why 'diskutil' is failing, but it could be that the unlocker subsystem is, itself, locked while attempting some other unrelated operation. Again, this process does succeed the majority of the time.

Even if this issue with 'diskutil' occasionally failing is resolved, this still doesn't explain why I'm not able to login... again, most of the time the volume is properly unlocked and mounted.

I made some debugging edits to helper.sh to try to figure out what was going wrong. This was not an attempt to fix anything, just debug.

ian834@machine.local:PrivilegedHelperTools/au.com.openwall.BootUnlock [49] % diff -C 3 helper.sh.20201029T090003.bak helper.sh
*** helper.sh.20201029T090003.bak       2020-02-10 15:21:28.000000000 -0800
--- helper.sh   2020-10-29 19:17:30.000000000 -0700
***************
*** 1,30 ****
  #!/bin/bash

! set -eu -o pipefail

  PATH=/sbin:/bin:/usr/sbin:/usr/bin

! echo "=== $(date) ==="
  diskutil apfs list -plist \
        | xsltproc --novalid "${0%/*}/diskutil.xsl" - \
        | grep -E ':true:true$' \
        | cut -f1-3 -d':' \
        | while IFS=: read NAME UUID DEVICE ; do
                printf 'Trying to unlock volume "%s" with UUID %s ...\n' "$NAME" "$UUID"
!               if ! PASSPHRASE=$(${0%/*}/BootUnlock find-generic-password \
                        -D 'Encrypted Volume Password' \
!                       -a "$UUID" -s "$UUID" -w); then
!                       echo 'NOTICE: could not find the secret on the System keychain, skipping the volume.' >&2
!                       continue
!               fi
!               if ! printf '%s' "$PASSPHRASE" | diskutil apfs unlock "$DEVICE" -stdinpassphrase ; then
!                       if [ -z "${PASSPHRASE//[[:digit:][a-fA-F]}" ]; then # This may be a hexadecimal string
!                               echo 'NOTICE: the passphrase looks like a hexdecimal string, re-trying ...' >&2
!                               if printf '%s' "$PASSPHRASE" | xxd -r -p | diskutil apfs unlock "$DEVICE" -stdinpassphrase; then
!                                       continue
!                               fi
!                       fi
!                       echo "ERROR: could not unlock volume '$NAME', skipping the volume." >&2
!                       continue
!               fi
        done
--- 1,29 ----
  #!/bin/bash

! echo "=== $(date) ==="
!
! #set -eu -o pipefail

  PATH=/sbin:/bin:/usr/sbin:/usr/bin

! echo "(!) DIAG: uptime: $(uptime)"
! echo "(!) DIAG: who: $(who)"
! #echo "(!) DIAG: env: $(env)"
! #echo "(!) DIAG: set: $(set)"
!
  diskutil apfs list -plist \
        | xsltproc --novalid "${0%/*}/diskutil.xsl" - \
        | grep -E ':true:true$' \
        | cut -f1-3 -d':' \
        | while IFS=: read NAME UUID DEVICE ; do
                printf 'Trying to unlock volume "%s" with UUID %s ...\n' "$NAME" "$UUID"
!               PASSPHRASE=$(${0%/*}/BootUnlock find-generic-password \
                        -D 'Encrypted Volume Password' \
!                       -a "$UUID" -s "$UUID" -w)
!               errno=$?
!               echo "(!) DIAG: errno: ${errno}"
!
!               printf '%s' "$PASSPHRASE" | diskutil apfs unlock "$DEVICE" -stdinpassphrase
!               errno=$?
!               echo "(!) DIAG: errno: ${errno}"
        done

After these changes my /var/log/BootUnlock.log looks like this:

ian834@machine.local:/var/log [45] % cat /var/log/BootUnlock.log
=== Thu Oct 29 18:51:53 PDT 2020 ===
(!) DIAG: uptime: 18:51  up 15 mins, 0 users, load averages: 1.70 1.65 1.21
(!) DIAG: who:
Trying to unlock volume "Users" with UUID 73E6E581-C9A1-4D84-9B99-B831CF1E9EC3 ...
(!) DIAG: errno: 0

diskutil: interrupted
Unlocking any cryptographic user on APFS Volume disk7s3
=== Thu Oct 29 18:54:51 PDT 2020 ===
(!) DIAG: uptime: 18:54  up 6 secs, 0 users, load averages: 0.75 0.16 0.06
(!) DIAG: who:
Trying to unlock volume "Users" with UUID 73E6E581-C9A1-4D84-9B99-B831CF1E9EC3 ...
(!) DIAG: errno: 0
Unlocking any cryptographic user on APFS Volume disk7s3
Unlocked and mounted APFS Volume
(!) DIAG: errno: 0
=== Thu Oct 29 18:56:16 PDT 2020 ===
(!) DIAG: uptime: 18:56  up 2 mins, 1 user, load averages: 1.42 0.52 0.20
(!) DIAG: who: ian834  console  Oct 29 18:55
Trying to unlock volume "Users" with UUID 73E6E581-C9A1-4D84-9B99-B831CF1E9EC3 ...
(!) DIAG: errno: 0
Unlocking any cryptographic user on APFS Volume disk7s3
Unlocked and mounted APFS Volume
(!) DIAG: errno: 0
=== Thu Oct 29 18:56:30 PDT 2020 ===
(!) DIAG: uptime: 18:56  up 2 mins, 8 users, load averages: 1.71 0.62 0.24
(!) DIAG: who: ian834  console  Oct 29 18:55
ian834  ttys004  Oct 29 18:56
ian834  ttys005  Oct 29 18:56
ian834  ttys006  Oct 29 18:56
ian834  ttys007  Oct 29 18:56
ian834  ttys008  Oct 29 18:56
ian834  ttys009  Oct 29 18:56
ian834  ttys010  Oct 29 18:56

The date markers now show when 'helper.sh' actually begins... nothing happens before this marker anymore.

The first execution of 'helper.sh' appears to occur before I reboot the machine.

The second execution of 'helper.sh' occurs immediately after reboot and succeeded, and occurred before anyone logged in -- the encrypted volume should have been usable at this point however login as user 'test1' (i.e. home dir on the encrypted external disk) still failed.

The third execution of 'helper.sh' also succeeded, but by this time I was already logged in as the user whose home dir is not on the encrypted external disk.

ian834@machine.local:/var/log [46] % diskutil list
/dev/disk0 (internal, physical):
   #:                       TYPE NAME                    SIZE       IDENTIFIER
   0:      GUID_partition_scheme                        *4.0 TB     disk0
   1:                        EFI EFI                     314.6 MB   disk0s1
   2:                 Apple_APFS Container disk1         4.0 TB     disk0s2

/dev/disk1 (synthesized):
   #:                       TYPE NAME                    SIZE       IDENTIFIER
   0:      APFS Container Scheme -                      +4.0 TB     disk1
                                 Physical Store disk0s2
   1:                APFS Volume Macintosh HD - Data     234.0 GB   disk1s1
   2:                APFS Volume Macintosh HD            11.1 GB    disk1s2
   3:                APFS Volume Preboot                 85.0 MB    disk1s3
   4:                APFS Volume Recovery                528.9 MB   disk1s4
   5:                APFS Volume VM                      20.5 KB    disk1s5

/dev/disk2 (external, physical):
   #:                       TYPE NAME                    SIZE       IDENTIFIER
   0:      GUID_partition_scheme                        *2.0 TB     disk2
   1:                        EFI EFI                     209.7 MB   disk2s1
   2:                 Apple_RAID                         2.0 TB     disk2s2
   3:                 Apple_Boot Boot OS X               134.2 MB   disk2s3

/dev/disk3 (external, physical):
   #:                       TYPE NAME                    SIZE       IDENTIFIER
   0:      GUID_partition_scheme                        *2.0 TB     disk3
   1:                        EFI EFI                     209.7 MB   disk3s1
   2:                 Apple_RAID                         2.0 TB     disk3s2
   3:                 Apple_Boot Boot OS X               134.2 MB   disk3s3

/dev/disk4 (external, physical):
   #:                       TYPE NAME                    SIZE       IDENTIFIER
   0:      GUID_partition_scheme                        *2.0 TB     disk4
   1:                        EFI EFI                     209.7 MB   disk4s1
   2:                 Apple_RAID                         2.0 TB     disk4s2
   3:                 Apple_Boot Boot OS X               134.2 MB   disk4s3

/dev/disk5 (external, physical):
   #:                       TYPE NAME                    SIZE       IDENTIFIER
   0:      GUID_partition_scheme                        *2.0 TB     disk5
   1:                        EFI EFI                     209.7 MB   disk5s1
   2:                 Apple_RAID                         2.0 TB     disk5s2
   3:                 Apple_Boot Boot OS X               134.2 MB   disk5s3

/dev/disk6 (external, virtual):
   #:                       TYPE NAME                    SIZE       IDENTIFIER
   0:                 Apple_APFS                        +8.0 TB     disk6

/dev/disk7 (synthesized):
   #:                       TYPE NAME                    SIZE       IDENTIFIER
   0:      APFS Container Scheme -                      +8.0 TB     disk7
                                 Physical Store disk6
   1:                APFS Volume Pictures                2.0 TB     disk7s2
   2:                APFS Volume Users                   2.0 TB     disk7s3

My "external" volume is actually a SSD m.2 stick connected to a PCIe riser card -- this isn't hot-pluggable so I can't do step N°3 of the test you propose. I might be able to find another truly external disk (as in USB interconnect, not PCIe) I can tinker with, or I can destroy the sw RAID and just access the SSD as a traditional disk.

galaxy4public commented 3 years ago

One other interesting thing I noticed -- after rebooting and trying to login as user "test1" (home dir on the encrypted external drive), the login fails, I login as user "test2" (home dir on the encrypted boot drive) I am getting a authentication pop-up (see attached screenshot). This pop-up appears connected to the BootUnlock execution because it's asking for permission to access the System keychain.

This is not related to BootUnlock, but to the DiskArbitrationHelper helper, if I recall correctly. To suppress that alert (which, if I understand correctly is trying to memorise the newly manually unlocked volume to be added to the keychain so the DiskArbitration would be able to unlock it automatically in the future [when you are already logged in]), I suppressed the DiskArbitrationHelper from being launched at the login (I unloaded the /System/Library/LaunchAgents/com.apple.DiskArbitrationAgent.plist, if my memory serves me right). In any case, while that prompt is visible, you can launch a terminal and investigate the process tree and identify which process has triggered it.

galaxy4public commented 3 years ago

The date markers now show when 'helper.sh' actually begins... nothing happens before this marker anymore.

This location is a bit unsafe (from the general portability point of view). Both commands before the original marker where purely internal bash commands: set was configuring the behaviour of the shell and PATH was to ensure that the script could find echo if for some reason the system had bash without built-in echo. Anyway, it is irrelevant to our puzzle here, but I wanted to clarify on that statement anyway. :)

I am thinking of removing RunAtLoad (https://github.com/openwall-com-au/BootUnlock/blob/master/scripts/postinstall#L39) from the plist since it seems to be the cause of running the script twice in some cases (all in all, we only need to react on the media being attached).

I also think that duskutil is getting interrupted since DiskArbitration is assembling the RAID (disk5s2 -> disk6), then loads the APFS container (disk6 -> disk7) -- and for each transition it calls the BootUnlock helper (due to the run condition). Iwill check this with an USB attached RAID in the evening (it's the middle of a work day here right now).

galaxy4public commented 3 years ago

I am actually struggling to create an Apple_RAID on the same physical block device (diskutil refuses to do that). ~@ian834, how did you manage to do that? (P.S. I am also interested in the use case: "why are you doing RAID on a single physical block device?")~

UPDATE: ignore my question, I see that in your setup the RAID array spans over 4 physical block devices.

galaxy4public commented 3 years ago

Well, I spent an unruly number of hours on this and so far I don't have a solution. The diskutil call when RAID is being assembled fails with:

Unable to run because unable to use the DiskManagement framework. Common reasons include, but are not limited to, the DiskArbitration framework being unavailable due to being booted in single-user mode.

Moreover, the helper.sh script receives the TERM signal from launchd shortly thereafter. I am still investigating and also researching on the alternative approach (through the use of SecurityAgentPlugins, namely DiskUnlock.bundle)

imcnish commented 3 years ago

Test w/o Apple software RAID:

ian834@machine.local:/var/log [43] % cat BootUnlock.log
=== Fri Oct 30 07:34:29 PDT 2020 ===
Trying to unlock volume "Users" with UUID 73E6E581-C9A1-4D84-9B99-B831CF1E9EC3 ...

diskutil: interrupted
Unlocking any cryptographic user on APFS Volume disk7s3
=== Fri Oct 30 07:35:20 PDT 2020 ===
Trying to unlock volume "Users" with UUID 73E6E581-C9A1-4D84-9B99-B831CF1E9EC3 ...
Unlocking any cryptographic user on APFS Volume disk7s3
Unlocked and mounted APFS Volume
Trying to unlock volume "TEST_VOL" with UUID D6AE51D9-F0F7-4486-9C0A-1DDB12280A60 ...
Unlocking any cryptographic user on APFS Volume disk9s1
Unlocked and mounted APFS Volume
=== Fri Oct 30 07:37:33 PDT 2020 ===
Trying to unlock volume "Users" with UUID 73E6E581-C9A1-4D84-9B99-B831CF1E9EC3 ...
Unlocking any cryptographic user on APFS Volume disk7s3
Unlocked and mounted APFS Volume
Trying to unlock volume "TEST_VOL" with UUID D6AE51D9-F0F7-4486-9C0A-1DDB12280A60 ...
Unlocking any cryptographic user on APFS Volume disk9s1
Unlocked and mounted APFS Volume
=== Fri Oct 30 07:37:44 PDT 2020 ===
=== Fri Oct 30 07:37:55 PDT 2020 ===
=== Fri Oct 30 07:48:39 PDT 2020 ===
ian834@machine.local:~ [44] % diskutil apfs list B8D0516A-4389-4C34-9E39-5D72FAF46AFB
|
+-- Container disk7 B8D0516A-4389-4C34-9E39-5D72FAF46AFB
    ====================================================
    APFS Container Reference:     disk7
    Size (Capacity Ceiling):      8000219774976 B (8.0 TB)
    Capacity In Use By Volumes:   4935281082368 B (4.9 TB) (61.7% used)
    Capacity Not Allocated:       3064938692608 B (3.1 TB) (38.3% free)
    |
    +-< Physical Store disk5 991B66A0-AF71-421B-9789-52BBEB89F715
    |   ---------------------------------------------------------
    |   APFS Physical Store Disk:   disk5
    |   Size:                       8000219774976 B (8.0 TB)
    |
    +-> Volume disk7s2 1762329A-2B55-4D4B-A2C8-761975AC048F
    |   ---------------------------------------------------
    |   APFS Volume Disk (Role):   disk7s2 (No specific role)
    |   Name:                      Pictures (Case-insensitive)
    |   Mount Point:               /Volumes/Pictures
    |   Capacity Consumed:         2000000000000 B (2.0 TB)
    |   Capacity Reserve:          2000000000000 B (2.0 TB)
    |   Capacity Quota:            None
    |   FileVault:                 No
    |
    +-> Volume disk7s3 73E6E581-C9A1-4D84-9B99-B831CF1E9EC3
        ---------------------------------------------------
        APFS Volume Disk (Role):   disk7s3 (No specific role)
        Name:                      Users (Case-insensitive)
        Mount Point:               /Volumes/Users
        Capacity Consumed:         2000000000000 B (2.0 TB)
        Capacity Reserve:          2000000000000 B (2.0 TB)
        Capacity Quota:            None
        FileVault:                 No

So it appears that merely having an Apple software RAID volume is enough to break BootUnlock, even if the software RAID is not the volume where the user's home dir is located.

imcnish commented 3 years ago

To test whether the encryption on the Apple software RAID is the issue, remove the encryption from the Apple software RAID volume:

ian834@machine.local:~ [45] % diskutil apfs decryptVolume /Volumes/Users
ian834@machine.local:~ [47] % diskutil apfs list -plist  | xsltproc --novalid "/Library/PrivilegedHelperTools/au.com.openwall.BootUnlock/diskutil.xsl" -
Macintosh HD - Data:CDE85451-CB67-496F-A766-731255A5077A:disk6s1:true:
Macintosh HD:B98F79AD-B88C-485D-B806-8B8B6ED5BBE9:disk6s2:true:
Preboot:212838DF-39A8-4FC9-B770-BD8D340CF5EE:disk6s3::
Recovery:4510B63C-CF80-4E24-BBF0-40E2C5844412:disk6s4::
VM:E4994AED-F877-447A-920A-F40CDAB45A66:disk6s5:true:
Pictures:1762329A-2B55-4D4B-A2C8-761975AC048F:disk7s2::
Users:73E6E581-C9A1-4D84-9B99-B831CF1E9EC3:disk7s3::
TEST_VOL:D6AE51D9-F0F7-4486-9C0A-1DDB12280A60:disk9s1:true:
ian834@machine.local:/var/log [43] % cat BootUnlock.log
=== Fri Oct 30 08:41:16 PDT 2020 ===
(!) DIAG: uptime:  8:41  up 15 mins, 9 users, load averages: 1.73 2.12 1.59
(!) DIAG: who: ian834  console  Oct 30 08:27 
ian834  ttys000  Oct 30 08:27 
ian834  ttys001  Oct 30 08:27 
ian834  ttys002  Oct 30 08:27 
ian834  ttys003  Oct 30 08:27 
ian834  ttys004  Oct 30 08:27 
ian834  ttys005  Oct 30 08:27 
ian834  ttys006  Oct 30 08:27 
ian834  ttys009  Oct 30 08:27 
=== Fri Oct 30 08:50:48 PDT 2020 ===
(!) DIAG: uptime:  8:50  up 24 mins, 0 users, load averages: 1.00 1.16 1.34
(!) DIAG: who: 
=== Fri Oct 30 08:51:39 PDT 2020 ===
(!) DIAG: uptime:  8:51  up 5 secs, 0 users, load averages: 1.47 0.31 0.11
(!) DIAG: who: 
Trying to unlock volume "TEST_VOL" with UUID D6AE51D9-F0F7-4486-9C0A-1DDB12280A60 ...
Unlocking any cryptographic user on APFS Volume disk9s1
Unlocked and mounted APFS Volume
=== Fri Oct 30 08:54:32 PDT 2020 ===
(!) DIAG: uptime:  8:54  up 3 mins, 1 user, load averages: 2.03 0.90 0.38
(!) DIAG: who: ian834  console  Oct 30 08:54 
Trying to unlock volume "TEST_VOL" with UUID D6AE51D9-F0F7-4486-9C0A-1DDB12280A60 ...
Unlocking any cryptographic user on APFS Volume disk9s1
Unlocked and mounted APFS Volume
=== Fri Oct 30 08:54:42 PDT 2020 ===
(!) DIAG: uptime:  8:54  up 3 mins, 9 users, load averages: 3.16 1.18 0.48
(!) DIAG: who: ian834  console  Oct 30 08:54 
ian834  ttys000  Oct 30 08:54 
ian834  ttys001  Oct 30 08:54 
ian834  ttys002  Oct 30 08:54 
ian834  ttys003  Oct 30 08:54 
ian834  ttys004  Oct 30 08:54 
ian834  ttys005  Oct 30 08:54 
ian834  ttys006  Oct 30 08:54 
ian834  ttys007  Oct 30 08:54 
=== Fri Oct 30 08:54:53 PDT 2020 ===
(!) DIAG: uptime:  8:54  up 3 mins, 9 users, load averages: 3.92 1.41 0.57
(!) DIAG: who: ian834  console  Oct 30 08:54 
ian834  ttys000  Oct 30 08:54 
ian834  ttys001  Oct 30 08:54 
ian834  ttys002  Oct 30 08:54 
ian834  ttys003  Oct 30 08:54 
ian834  ttys004  Oct 30 08:54 
ian834  ttys005  Oct 30 08:54 
ian834  ttys006  Oct 30 08:54 
ian834  ttys007  Oct 30 08:54 
imcnish commented 3 years ago

To test whether the Apple software RAID is the issue, remove the Apple software RAID volume:

ian834@machine.local:~ [49] % sudo diskutil apfs deleteVolume Users                  
Started APFS operation
Deleting APFS Volume from its APFS Container
Unmounting disk7s3
Erasing any xART session referenced by 73E6E581-C9A1-4D84-9B99-B831CF1E9EC3
Deleting Volume
Removing any Preboot and Recovery Directories
Finished APFS operation
ian834@machine.local:~ [50] % sudo diskutil apfs deleteVolume Pictures
Started APFS operation
Deleting APFS Volume from its APFS Container
Unmounting disk7s2
Erasing any xART session referenced by 1762329A-2B55-4D4B-A2C8-761975AC048F
Deleting Volume
Removing any Preboot and Recovery Directories
Finished APFS operation
ian834@machine.local:~ [51] % sudo diskutil apfs deleteContainer B8D0516A-4389-4C34-9E39-5D72FAF46AFB
Started APFS operation on disk7
Deleting APFS Container with all of its APFS Volumes
Unmounting Volumes
Deleting Volumes
Deleting Container
Wiping former APFS disks
Switching content types
Reformatting former APFS disks
Initialized /dev/rdisk5 as a 7 TB case-insensitive HFS Plus volume with a 524288k journal
Mounting disk
1 new disk created or changed due to APFS operation
Disk from APFS operation: disk5
Finished APFS operation on disk7
ian834@machine.local:~ [52] % diskutil appleRAID list
AppleRAID sets (1 found)
===============================================================================
Name:                 Untitled
Unique ID:            991B66A0-AF71-421B-9789-52BBEB89F715
Type:                 Stripe
Status:               Online
Size:                 8.0 TB (8000219774976 Bytes)
Rebuild:              manual
Device Node:          disk5
-------------------------------------------------------------------------------
#  DevNode   UUID                                  Status     Size
-------------------------------------------------------------------------------
0  disk4s2   4F986CEC-04D1-434F-B8C5-81A4DEC801CA  Online     2000054943744
1  disk0s2   26395F1F-DE70-485C-86DF-0F5E11D946A3  Online     2000054943744
2  disk2s2   916BEEAC-59D4-4F0F-8777-630A04061363  Online     2000054943744
3  disk3s2   DDA3C804-5B9B-479F-ABB3-AC180E6E84AF  Online     2000054943744
===============================================================================
ian834@machine.local:~ [53] % sudo diskutil appleRAID delete 991B66A0-AF71-421B-9789-52BBEB89F715
Started RAID operation on disk5 Untitled
Unmounting volume for RAID set 991B66A0-AF71-421B-9789-52BBEB89F715
Destroying the RAID set 991B66A0-AF71-421B-9789-52BBEB89F715
Finished RAID operation on disk5 Untitled
ian834@machine.local:~ [54] % diskutil list
/dev/disk0 (external, physical):
   #:                       TYPE NAME                    SIZE       IDENTIFIER
   0:      GUID_partition_scheme                        *2.0 TB     disk0
   1:                        EFI EFI                     209.7 MB   disk0s1
   2:         Apple_RAID_Offline                         2.0 TB     disk0s2
   3:                 Apple_Boot Boot OS X               134.2 MB   disk0s3

/dev/disk1 (internal, physical):
   #:                       TYPE NAME                    SIZE       IDENTIFIER
   0:      GUID_partition_scheme                        *4.0 TB     disk1
   1:                        EFI EFI                     314.6 MB   disk1s1
   2:                 Apple_APFS Container disk6         4.0 TB     disk1s2

/dev/disk2 (external, physical):
   #:                       TYPE NAME                    SIZE       IDENTIFIER
   0:      GUID_partition_scheme                        *2.0 TB     disk2
   1:                        EFI EFI                     209.7 MB   disk2s1
   2:         Apple_RAID_Offline                         2.0 TB     disk2s2
   3:                 Apple_Boot Boot OS X               134.2 MB   disk2s3

/dev/disk3 (external, physical):
   #:                       TYPE NAME                    SIZE       IDENTIFIER
   0:      GUID_partition_scheme                        *2.0 TB     disk3
   1:                        EFI EFI                     209.7 MB   disk3s1
   2:         Apple_RAID_Offline                         2.0 TB     disk3s2
   3:                 Apple_Boot Boot OS X               134.2 MB   disk3s3

/dev/disk4 (external, physical):
   #:                       TYPE NAME                    SIZE       IDENTIFIER
   0:      GUID_partition_scheme                        *2.0 TB     disk4
   1:                        EFI EFI                     209.7 MB   disk4s1
   2:         Apple_RAID_Offline                         2.0 TB     disk4s2
   3:                 Apple_Boot Boot OS X               134.2 MB   disk4s3

/dev/disk6 (synthesized):
   #:                       TYPE NAME                    SIZE       IDENTIFIER
   0:      APFS Container Scheme -                      +4.0 TB     disk6
                                 Physical Store disk1s2
   1:                APFS Volume Macintosh HD - Data     234.3 GB   disk6s1
   2:                APFS Volume Macintosh HD            11.1 GB    disk6s2
   3:                APFS Volume Preboot                 85.0 MB    disk6s3
   4:                APFS Volume Recovery                528.9 MB   disk6s4
   5:                APFS Volume VM                      20.5 KB    disk6s5

/dev/disk8 (external, physical):
   #:                       TYPE NAME                    SIZE       IDENTIFIER
   0:      GUID_partition_scheme                        *256.6 GB   disk8
   1:                        EFI EFI                     209.7 MB   disk8s1
   2:                 Apple_APFS Container disk9         256.4 GB   disk8s2

/dev/disk9 (synthesized):
   #:                       TYPE NAME                    SIZE       IDENTIFIER
   0:      APFS Container Scheme -                      +256.4 GB   disk9
                                 Physical Store disk8s2
   1:                APFS Volume TEST_VOL                61.2 MB    disk9s1
ian834@machine.local:/var/log [45] % cat BootUnlock.log
=== Fri Oct 30 08:59:55 PDT 2020 ===
(!) DIAG: uptime:  8:59  up 8 mins, 9 users, load averages: 1.74 1.53 0.87
(!) DIAG: who: ian834  console  Oct 30 08:54 
ian834  ttys000  Oct 30 08:54 
ian834  ttys001  Oct 30 08:54 
ian834  ttys002  Oct 30 08:54 
ian834  ttys003  Oct 30 08:54 
ian834  ttys004  Oct 30 08:54 
ian834  ttys005  Oct 30 08:54 
ian834  ttys006  Oct 30 08:54 
ian834  ttys007  Oct 30 08:54 
=== Fri Oct 30 09:02:43 PDT 2020 ===
(!) DIAG: uptime:  9:02  up 11 mins, 0 users, load averages: 1.34 1.39 0.92
(!) DIAG: who: 
Trying to unlock volume "TEST_VOL" with UUID D6AE51D9-F0F7-4486-9C0A-1DDB12280A60 ...

diskutil: interrupted
Unlocking any cryptographic user on APFS Volume disk9s1
=== Fri Oct 30 09:03:32 PDT 2020 ===
(!) DIAG: uptime:  9:03  up 5 secs, 0 users, load averages: 0.75 0.16 0.06
(!) DIAG: who: 
Trying to unlock volume "TEST_VOL" with UUID D6AE51D9-F0F7-4486-9C0A-1DDB12280A60 ...
Unlocking any cryptographic user on APFS Volume disk7s1
Unlocked and mounted APFS Volume
=== Fri Oct 30 09:08:00 PDT 2020 ===
(!) DIAG: uptime:  9:08  up 5 mins, 9 users, load averages: 1.91 0.80 0.35
(!) DIAG: who: ian834  console  Oct 30 09:07 
ian834  ttys000  Oct 30 09:07 
ian834  ttys001  Oct 30 09:07 
ian834  ttys002  Oct 30 09:07 
ian834  ttys003  Oct 30 09:07 
ian834  ttys004  Oct 30 09:07 
ian834  ttys005  Oct 30 09:07 
ian834  ttys006  Oct 30 09:07 
ian834  ttys007  Oct 30 09:07 
Trying to unlock volume "TEST_VOL" with UUID D6AE51D9-F0F7-4486-9C0A-1DDB12280A60 ...
Unlocking any cryptographic user on APFS Volume disk7s1
Unlocked and mounted APFS Volume
=== Fri Oct 30 09:08:10 PDT 2020 ===
(!) DIAG: uptime:  9:08  up 5 mins, 9 users, load averages: 2.48 0.96 0.41
(!) DIAG: who: ian834  console  Oct 30 09:07 
ian834  ttys000  Oct 30 09:07 
ian834  ttys001  Oct 30 09:07 
ian834  ttys002  Oct 30 09:07 
ian834  ttys003  Oct 30 09:07 
ian834  ttys004  Oct 30 09:07 
ian834  ttys005  Oct 30 09:07 
ian834  ttys006  Oct 30 09:07 
ian834  ttys007  Oct 30 09:07 
=== Fri Oct 30 09:16:52 PDT 2020 ===
(!) DIAG: uptime:  9:16  up 13 mins, 9 users, load averages: 1.41 1.35 0.87
(!) DIAG: who: ian834  console  Oct 30 09:07 
ian834  ttys000  Oct 30 09:07 
ian834  ttys001  Oct 30 09:07 
ian834  ttys002  Oct 30 09:07 
ian834  ttys003  Oct 30 09:07 
ian834  ttys004  Oct 30 09:07 
ian834  ttys005  Oct 30 09:07 
ian834  ttys006  Oct 30 09:07 
ian834  ttys007  Oct 30 09:07 

Timeline:

So if BootUnlock was able to unlock and mount "TEST_VOL” at 09:24:18, why/how was this unmounted already when “test1” attempted to login at 09:24:31? The unlock and mount messages captured in BootUnlock.log are actually generated by ‘diskutil’ so they should be trustworthy.

imcnish commented 3 years ago

Added two calls to 'df' to 'helper.sh' to verify whether "TEST_VOL" was ever actually mounted. One 'df' call is at the start of 'helper.sh' and one is at the end -- should represent a before and after view within the span of 'helper.sh' execution.

Timeline:

ian834@machine.local:/var/log [45] % cat BootUnlock.log
=== Fri Oct 30 11:16:25 PDT 2020 ===
(!) DIAG: uptime: 11:16  up 4 mins, 0 users, load averages: 2.77 1.49 0.63
(!) DIAG: who:
(!) DIAG: df: Filesystem     Size   Used  Avail Capacity iused       ifree %iused  Mounted on
/dev/disk5s2  3.6Ti   10Gi  3.4Ti     1%  488254 39080641706    0%   /
/dev/disk5s1  3.6Ti  218Gi  3.4Ti     6% 1286868 39079843092    0%   /System/Volumes/Data
/dev/disk5s5  3.6Ti   20Ki  3.4Ti     1%       0 39081129960    0%   /private/var/vm
Trying to unlock volume "TEST_VOL" with UUID D6AE51D9-F0F7-4486-9C0A-1DDB12280A60 ...

diskutil: interrupted
Unlocking any cryptographic user on APFS Volume disk7s1
=== Fri Oct 30 11:18:15 PDT 2020 ===
(!) DIAG: uptime: 11:18  up 5 secs, 0 users, load averages: 0.75 0.16 0.06
(!) DIAG: who:
(!) DIAG: df: Filesystem     Size   Used  Avail Capacity iused       ifree %iused  Mounted on
/dev/disk5s2  3.6Ti   10Gi  3.4Ti     1%  488254 39080641706    0%   /
/dev/disk5s1  3.6Ti  218Gi  3.4Ti     6% 1286435 39079843525    0%   /System/Volumes/Data
/dev/disk5s5  3.6Ti   20Ki  3.4Ti     1%       0 39081129960    0%   /private/var/vm
Trying to unlock volume "TEST_VOL" with UUID D6AE51D9-F0F7-4486-9C0A-1DDB12280A60 ...
Unlocking any cryptographic user on APFS Volume disk7s1
Unlocked and mounted APFS Volume
(!) DIAG: df: Filesystem     Size   Used  Avail Capacity iused       ifree %iused  Mounted on
/dev/disk5s2  3.6Ti   10Gi  3.4Ti     1%  488254 39080641706    0%   /
/dev/disk5s1  3.6Ti  218Gi  3.4Ti     6% 1286452 39079843508    0%   /System/Volumes/Data
/dev/disk5s5  3.6Ti   20Ki  3.4Ti     1%       0 39081129960    0%   /private/var/vm
/dev/disk7s1  239Gi   58Mi  239Gi     1%    2664  2504214616    0%   /Volumes/TEST_VOL
=== Fri Oct 30 11:18:23 PDT 2020 ===
(!) DIAG: uptime: 11:18  up 13 secs, 0 users, load averages: 0.85 0.19 0.07
(!) DIAG: who:
(!) DIAG: df: Filesystem     Size   Used  Avail Capacity iused       ifree %iused  Mounted on
/dev/disk5s2  3.6Ti   10Gi  3.4Ti     1%  488254 39080641706    0%   /
/dev/disk5s1  3.6Ti  218Gi  3.4Ti     6% 1286476 39079843484    0%   /System/Volumes/Data
/dev/disk5s5  3.6Ti   20Ki  3.4Ti     1%       0 39081129960    0%   /private/var/vm
/dev/disk7s1  239Gi   58Mi  239Gi     1%    2664  2504214616    0%   /Volumes/TEST_VOL
/dev/disk5s4  3.6Ti  504Mi  3.4Ti     1%      50 39081129910    0%   /Volumes/Recovery
=== Fri Oct 30 11:21:01 PDT 2020 ===
(!) DIAG: uptime: 11:21  up 3 mins, 9 users, load averages: 2.75 0.97 0.39
(!) DIAG: who: ian834  console  Oct 30 11:20
ian834  ttys000  Oct 30 11:20
ian834  ttys001  Oct 30 11:20
ian834  ttys002  Oct 30 11:20
ian834  ttys003  Oct 30 11:20
ian834  ttys006  Oct 30 11:20
ian834  ttys009  Oct 30 11:20
ian834  ttys010  Oct 30 11:20
ian834  ttys011  Oct 30 11:20
(!) DIAG: df: Filesystem     Size   Used  Avail Capacity iused       ifree %iused  Mounted on
/dev/disk5s2  3.6Ti   10Gi  3.4Ti     1%  488254 39080641706    0%   /
/dev/disk5s1  3.6Ti  218Gi  3.4Ti     6% 1286879 39079843081    0%   /System/Volumes/Data
/dev/disk5s5  3.6Ti   20Ki  3.4Ti     1%       0 39081129960    0%   /private/var/vm
Trying to unlock volume "TEST_VOL" with UUID D6AE51D9-F0F7-4486-9C0A-1DDB12280A60 ...
Unlocking any cryptographic user on APFS Volume disk7s1
Unlocked and mounted APFS Volume
(!) DIAG: df: Filesystem     Size   Used  Avail Capacity iused       ifree %iused  Mounted on
/dev/disk5s2  3.6Ti   10Gi  3.4Ti     1%  488254 39080641706    0%   /
/dev/disk5s1  3.6Ti  218Gi  3.4Ti     6% 1286926 39079843034    0%   /System/Volumes/Data
/dev/disk5s5  3.6Ti   20Ki  3.4Ti     1%       0 39081129960    0%   /private/var/vm
/dev/disk7s1  239Gi   58Mi  239Gi     1%    2666  2504214614    0%   /Volumes/TEST_VOL
ian834@machine.local:/var/log [46] % grep BootUnlock /var/log/system.log | tail -5
/var/log/system.log:Oct 30 11:18:18 inky com.apple.xpc.launchd[1] (au.com.openwall.BootUnlock): Service only ran for 5 seconds. Pushing respawn out by 5 seconds.
/var/log/system.log:Oct 30 11:18:23 inky com.apple.xpc.launchd[1] (au.com.openwall.BootUnlock[374]): Service exited with abnormal code: 1
/var/log/system.log:Oct 30 11:21:09 inky com.apple.xpc.launchd[1] (au.com.openwall.BootUnlock): Service only ran for 8 seconds. Pushing respawn out by 2 seconds.
/var/log/system.log:Oct 30 11:21:12 inky com.apple.xpc.launchd[1] (au.com.openwall.BootUnlock[1985]): Service exited with abnormal code: 1
/var/log/system.log:Oct 30 11:31:35 inky com.apple.xpc.launchd[1] (au.com.openwall.BootUnlock[2566]): Service exited with abnormal code: 1
ian834@machine.local:/var/log [47] % diskutil list
/dev/disk0 (external, physical):
   #:                       TYPE NAME                    SIZE       IDENTIFIER
   0:      GUID_partition_scheme                        *2.0 TB     disk0
   1:                        EFI EFI                     209.7 MB   disk0s1
   2:         Apple_RAID_Offline                         2.0 TB     disk0s2
   3:                 Apple_Boot Boot OS X               134.2 MB   disk0s3

/dev/disk1 (external, physical):
   #:                       TYPE NAME                    SIZE       IDENTIFIER
   0:      GUID_partition_scheme                        *2.0 TB     disk1
   1:                        EFI EFI                     209.7 MB   disk1s1
   2:         Apple_RAID_Offline                         2.0 TB     disk1s2
   3:                 Apple_Boot Boot OS X               134.2 MB   disk1s3

/dev/disk2 (internal, physical):
   #:                       TYPE NAME                    SIZE       IDENTIFIER
   0:      GUID_partition_scheme                        *4.0 TB     disk2
   1:                        EFI EFI                     314.6 MB   disk2s1
   2:                 Apple_APFS Container disk5         4.0 TB     disk2s2

/dev/disk3 (external, physical):
   #:                       TYPE NAME                    SIZE       IDENTIFIER
   0:      GUID_partition_scheme                        *2.0 TB     disk3
   1:                        EFI EFI                     209.7 MB   disk3s1
   2:         Apple_RAID_Offline                         2.0 TB     disk3s2
   3:                 Apple_Boot Boot OS X               134.2 MB   disk3s3

/dev/disk4 (external, physical):
   #:                       TYPE NAME                    SIZE       IDENTIFIER
   0:      GUID_partition_scheme                        *2.0 TB     disk4
   1:                        EFI EFI                     209.7 MB   disk4s1
   2:         Apple_RAID_Offline                         2.0 TB     disk4s2
   3:                 Apple_Boot Boot OS X               134.2 MB   disk4s3

/dev/disk5 (synthesized):
   #:                       TYPE NAME                    SIZE       IDENTIFIER
   0:      APFS Container Scheme -                      +4.0 TB     disk5
                                 Physical Store disk2s2
   1:                APFS Volume Macintosh HD - Data     234.4 GB   disk5s1
   2:                APFS Volume Macintosh HD            11.1 GB    disk5s2
   3:                APFS Volume Preboot                 85.0 MB    disk5s3
   4:                APFS Volume Recovery                528.9 MB   disk5s4
   5:                APFS Volume VM                      20.5 KB    disk5s5

/dev/disk6 (external, physical):
   #:                       TYPE NAME                    SIZE       IDENTIFIER
   0:      GUID_partition_scheme                        *256.6 GB   disk6
   1:                        EFI EFI                     209.7 MB   disk6s1
   2:                 Apple_APFS Container disk7         256.4 GB   disk6s2

/dev/disk7 (synthesized):
   #:                       TYPE NAME                    SIZE       IDENTIFIER
   0:      APFS Container Scheme -                      +256.4 GB   disk7
                                 Physical Store disk6s2
   1:                APFS Volume TEST_VOL                61.3 MB    disk7s1
galaxy4public commented 3 years ago

@ian834 , just to share where I am at, the following is my current, bastardised helper.sh which produces a bit more debugging info:

galaxy:LaunchDaemons root# cat /Library/PrivilegedHelperTools/au.com.openwall.BootUnlock/helper.sh 
#!/bin/bash

set -eu -o pipefail

PATH=/sbin:/bin:/usr/sbin:/usr/bin

report() {
    RC=$?
    echo "RC($1) = $RC (${BASH_LINENO[@]})"
    mount
    exit $RC
}

for signal in EXIT ERR HUP INT QUIT ILL TRAP ABRT EMT FPE KILL BUS SEGV SYS PIPE ALRM TERM URG STOP TSTP CONT CHLD TTIN TTOU IO XCPU XFSZ VTALRM PROF WINCH INFO USR1 USR2
do
    trap "report $signal" $signal
done

echo "=== $(date) ==="
echo "Mounts before work:"
mount

if ! OUTPUT=$(diskutil apfs list -plist); then
    RC=$?
    echo "ERROR: diskutil failed with code $RC and produced the following output:"
    printf '>>>\n%s\n<<<\n' "$OUTPUT"
    trap - EXIT; exit 1
fi

printf '%s' "$OUTPUT" \
    | xsltproc --novalid "${0%/*}/diskutil.xsl" - \
    | sed -n '/:true:true$/{ s/:true:true$//;p; }' \
    | while IFS=: read NAME UUID DEVICE ; do
        printf 'Trying to unlock volume "%s" with UUID %s ...\n' "$NAME" "$UUID"
        if ! PASSPHRASE=$(${0%/*}/BootUnlock find-generic-password \
            -D 'Encrypted Volume Password' \
            -a "$UUID" -s "$UUID" -w); then
            echo 'NOTICE: could not find the secret on the System keychain, skipping the volume.' >&2
            continue
        fi
        if ! printf '%s' "$PASSPHRASE" | diskutil apfs unlock "$DEVICE" -stdinpassphrase ; then
            if [ -z "${PASSPHRASE//[[:digit:][a-fA-F]}" ]; then # This may be a hexadecimal string
                echo 'NOTICE: the passphrase looks like a hexdecimal string, re-trying ...' >&2
                if printf '%s' "$PASSPHRASE" | xxd -r -p | diskutil apfs unlock "$DEVICE" -stdinpassphrase; then
                    continue
                fi
            fi
            echo "ERROR: could not unlock volume '$NAME', skipping the volume." >&2
            continue
        fi
    done
trap - EXIT
echo "Success"
mount

galaxy:LaunchDaemons root# 

This version intercepts signals, prints mounts before and after, and outputs diskutil's messages if latter fails. I will experiment a bit more today and hopefully find a solution. Meanwhile, you should be able to use your system if you login as a user that has its home directory on the System volume, then mount the external volume, logout, and login as the user with the home directory on the external volume -- this is the workflow I was using before I came up with BootUnlock.

galaxy4public commented 3 years ago

OK, my latest investigation lead me to the following: BootUnlock successfully mounts an encrypted APFS on an external (USB attached AppleRAID), but there are still "dragons" lurking in the dark. I am going to commit my latest changes to the helper script, so anybody who has time and is facing the issue could pick it up from there, but I spent all the spare time I had to investigate this problem :(.

I started with creating an AppleRAID on a USB stick. The only way I found to do that was to create an HFS+ partition (it can be any type, but I am documenting what I did) on the USB stick, then run diskutil appleRAID enable mirror /Volumes/part1 where part1 is the name of the HFS+ volume. This produces a degraded RAID1, which is good enough for testing. The next step was to go to the Disk Utility app and format the newly introduced RAID1 volume as APFS container and add an APFS encrypted volume to it (I named that volume to be APFS-on-RAID):

galaxy:~ root# diskutil list
/dev/disk0 (internal, physical):
   #:                       TYPE NAME                    SIZE       IDENTIFIER
   0:      GUID_partition_scheme                        *500.3 GB   disk0
   1:                        EFI EFI                     314.6 MB   disk0s1
   2:                 Apple_APFS Container disk1         500.0 GB   disk0s2

/dev/disk1 (synthesized):
   #:                       TYPE NAME                    SIZE       IDENTIFIER
   0:      APFS Container Scheme -                      +500.0 GB   disk1
                                 Physical Store disk0s2
   1:                APFS Volume Catalina — Data         15.8 GB    disk1s1
   2:                APFS Volume Preboot                 86.2 MB    disk1s2
   3:                APFS Volume Recovery                529.0 MB   disk1s3
   4:                APFS Volume VM                      1.1 GB     disk1s4
   5:                APFS Volume Home                    457.2 GB   disk1s6
   6:                APFS Volume Catalina                11.3 GB    disk1s7

/dev/disk2 (external, physical):
   #:                       TYPE NAME                    SIZE       IDENTIFIER
   0:      GUID_partition_scheme                        *32.0 GB    disk2
   1:                        EFI EFI                     209.7 MB   disk2s1
   2:                 Apple_RAID                         15.8 GB    disk2s2
   3:                 Apple_Boot Boot OS X               134.2 MB   disk2s3
   4:                  Apple_HFS part2                   15.7 GB    disk2s4

/dev/disk3 (external, virtual):
   #:                       TYPE NAME                    SIZE       IDENTIFIER
   0:                 Apple_APFS                        +15.8 GB    disk3

/dev/disk4 (synthesized):
   #:                       TYPE NAME                    SIZE       IDENTIFIER
   0:      APFS Container Scheme -                      +15.8 GB    disk4
                                 Physical Store disk3
   1:                APFS Volume APFS-on-RAID            58.7 MB    disk4s1

galaxy:~ root#

I then created a test user (raidtest) and set its home to be on /Volumes/APFS-on-RAID/raidtest.

I also found that by default the volume was being mounted with noowners, so to fix that I ran diskutil enableOwnership disk4s1.

With the aforementioned helper.sh (the only extra I added was uptime to see when the script was called in relation to the boot time) observed the following:

=== Sat Oct 31 15:01:24 AEDT 2020 ===
15:01  up 4 secs, 0 users, load averages: 0.64 0.13 0.05
Mounts before work:
/dev/disk1s7 on / (apfs, local, read-only, journaled)
devfs on /dev (devfs, local, nobrowse)
/dev/disk1s1 on /System/Volumes/Data (apfs, local, journaled, nobrowse)
/dev/disk1s4 on /private/var/vm (apfs, local, journaled, nobrowse)
map auto_home on /System/Volumes/Data/home (autofs, automounted, nobrowse)
Before RAID detect: Sat Oct 31 15:01:24 AEDT 2020
After RAID detect: Sat Oct 31 15:01:29 AEDT 2020
Trying to unlock volume "Home" with UUID C1A7E5BC-3361-4E01-868E-C55D0AD9C90D ...
Unlocking any cryptographic user on APFS Volume disk1s6
Unlocked and mounted APFS Volume
Trying to unlock volume "APFS-on-RAID" with UUID 1A60389F-7C57-49C4-9742-D9649DE6AF94 ...
Unlocking any cryptographic user on APFS Volume disk4s1
Unlocked and mounted APFS Volume
Success
/dev/disk1s7 on / (apfs, local, read-only, journaled)
devfs on /dev (devfs, local, nobrowse)
/dev/disk1s1 on /System/Volumes/Data (apfs, local, journaled, nobrowse)
/dev/disk1s4 on /private/var/vm (apfs, local, journaled, nobrowse)
map auto_home on /System/Volumes/Data/home (autofs, automounted, nobrowse)
/dev/disk1s6 on /Volumes/Home (apfs, local, journaled)
/dev/disk4s1 on /Volumes/APFS-on-RAID (apfs, local, nodev, nosuid, journaled)

I was booting up using the "testraid" account and it failed to successfully login, hence I ssh'ed in and confirmed that between 4 seconds from the bootup and the moment I ssh'ed in something unmounted the "APFS-on-RAID" volume.

I did something else elsewhere, and when I looked into BootUnlock.log 5 minutes later I found the following:

=== Sat Oct 31 15:03:35 AEDT 2020 ===
15:03  up 2 mins, 1 user, load averages: 0.86 0.43 0.18
Mounts before work:
/dev/disk1s7 on / (apfs, local, read-only, journaled)
devfs on /dev (devfs, local, nobrowse)
/dev/disk1s1 on /System/Volumes/Data (apfs, local, journaled, nobrowse)
/dev/disk1s4 on /private/var/vm (apfs, local, journaled, nobrowse)
map auto_home on /System/Volumes/Data/home (autofs, automounted, nobrowse)
/dev/disk1s6 on /Volumes/Home (apfs, local, journaled)
/dev/disk1s3 on /Volumes/Recovery (apfs, local, journaled, nobrowse)
Before RAID detect: Sat Oct 31 15:03:35 AEDT 2020
After RAID detect: Sat Oct 31 15:03:41 AEDT 2020
Trying to unlock volume "APFS-on-RAID" with UUID 1A60389F-7C57-49C4-9742-D9649DE6AF94 ...
Unlocking any cryptographic user on APFS Volume disk4s1
Unlocked and mounted APFS Volume
Success
/dev/disk1s7 on / (apfs, local, read-only, journaled)
devfs on /dev (devfs, local, nobrowse)
/dev/disk1s1 on /System/Volumes/Data (apfs, local, journaled, nobrowse)
/dev/disk1s4 on /private/var/vm (apfs, local, journaled, nobrowse)
map auto_home on /System/Volumes/Data/home (autofs, automounted, nobrowse)
/dev/disk1s6 on /Volumes/Home (apfs, local, journaled)
/dev/disk4s1 on /Volumes/APFS-on-RAID (apfs, local, nodev, nosuid, journaled)

I issued mount in my ssh session and sure enough the volume was indeed mounted, so I pressed "Enter" on the login screen for "raidtest" and it got stuck in an attempt to initialised the session.

Since I was still logged in remotely, I looked at the process list and the only process attributed to "testraid" on the console (ps auxww | fgrep console) was the loginwindow. I killed that process with kill, the screen flashed, and I was back at the login prompt.

I entered the login and password for "testraid" and successfully logged in. However, something was still not right with the security framework since when I was logging out it stuck on that loginwindow again (until I killed it).

I am not entirely sure that I got any more spark to look into this since the BootUnlock helper does its job of reacting on the "mount" event for devices and does unlock and mount volumes.

imcnish commented 3 years ago

I'll play with it some more, but going back to a previous comment -- I don't think the issue I'm experiencing is caused by AppleRAID. I was able to reproduce the issue on my system even after I completely removed the AppleRAID. That said, the problem may still be in MacOS, not BootUnlock.

test whether the Apple software RAID is the issue

Note: Also, in the comment before that (test whether the encryption on the Apple software RAID is the issue) the AppleRAID was still present but encryption had been removed on the AppleRAID volumes. In this test the AppleRAID contained the volumes 'Pictures' and 'Users' but both were unencrypted. The volume 'TEST_VOL' was encrypted but was on a standard USB thumb drive (not an AppleRAID).

galaxy4public commented 3 years ago

@ian834, I was reading your comments carefully and I concur that the root cause has nothing to do with AppleRAID itself, but with an external to the primary APFS container, in my opinion.

A bit of offtopic below: My current dive into macOS suggests that loginwindow makes a call to a SecurityAgentPlugin that can unlock and mount an APFS volume on demand, however, this requires the APFS volume to have Crypto users defined and Apple in their wisdom do not provide a tool to add that to the volume. I am currently working on an utility that could do that - and if I am successful we would have a much better user experience with external encrypted volumes.