openzfsonosx / zfs

OpenZFS on OS X
https://openzfsonosx.org/
Other
822 stars 72 forks source link

codesign for booting #735

Open lundman opened 4 years ago

lundman commented 4 years ago

I went to update the ZFS-on-boot instructions https://openzfsonosx.org/wiki/ZFS_on_Boot and attempted to boot Catalina, but we do have an issue with mounting root:

Loaded module v1.9.2-4-d4889d276e, ZFS pool version 5000, ZFS filesystem version 5
zfs_boot_publish_bootfs: publishing bootfs [rpool/ROOT/Catalina]
zfs_boot_publish_bootfs done
Got boot device = IOService:/IOResources/net_lundman_zfs_zvol/rpool/ZFSDatasetProxy/IOBlockStorageDriver/rpool Media/ZFSDatasetScheme/Catalina@1
BSD root: disk6s1, major 1, minor 23
hfs_ValidateHFSPlusVolumeHeader: unknown Volume Signature : 0
hfs_mount: hfs_mountfs returned error=22 for device unknown-dev
ZFS: zfs_vfs_mountroot
Setting readonly
Not booted from APFS, skipping apfs.util
boot _checkBrokenSignatureWithTeamIDFatal(LazyPath *, struct cs_blob *): no registered daemon port for check_broken_signature_with_teamid_fatal
mac_vnode_check_signature: /Library/Filesystems/zfs.fs/Contents/Resouces/mount_zfs: code signature validation failed fatally: 
 When validating /Library/Filesystems/zfs.fs/Contents/Resources/mount_zfs:
  The code contains a Team ID, but validating its signature failed.
Please check you system log.proc 5: load code signature error 4 for file "mount_zfs"
port is not ready for callouts
mount: /: Killed
lundman commented 4 years ago

If mount_zfs is not codesigned, it has the same complaints about the libraries. So an unsigned, static, zfs binary named zfs_mount is the way to go.

It does seem to stall a bit later due to memory concerns though.

lundman commented 4 years ago

OK, takes about 20mins to boot Catalina with 2G of RAM, the spindump during boot looks like:

http://www.lundman.net/boot-spindump.txt

It seems that any paging falls apart, unsure why yet - if you spot something peculiar, mention it.

lundman commented 4 years ago

With 4GB RAM, it does boot to UI after about 20mins. At which point, I could disable mds and launchctl remove com.apple.appstoreagent, zfs set sync=disabled rpool/ROOT/Catalina to give it a bit less IO.

Logging into to GUI to test, it appears somewhere along the line we managed to fix the font problem:

O3XBoot

(Apologies about the hacky 'photoshop' job)

If we can solve the cause of it running in molasses, it would be potentially possible to run the OS this way.

lundman commented 4 years ago

And also a spindump waiting long enough for it to be idle. It is still sluggish, and kernel_task is pretty busy - the trick will be to find where.

http://www.lundman.net/boot-idle-spindump.txt

lundman commented 4 years ago

and flamegraph while it is idle:

http://www.lundman.net/zfsboot.svg

arc_reclaim_thread() shows a bit too much, I would expect that to be mostly idle (even if low on memory) zfs_vnop_pagein() also taking a lot of time.

lundman commented 4 years ago

arc_reclaim_thread() seems to be looping here:

           } else if (evicted >= SPA_MAXBLOCKSIZE * ARCSTAT(arc_reclaim_waiters_count)) {
               // we evicted plenty of buffers, so let's wake up
               // all the waiters rather than having them stall
               ARCSTAT_BUMP(arc_reclaim_waiters_early_broadcast);
               cv_broadcast(&arc_reclaim_waiters_cv);

kstat.zfs.misc.arcstats.arc_reclaim_waiters_cnt: 0
kstat.zfs.misc.arcstats.arc_reclaim_waiters_cur: 0
kstat.zfs.misc.arcstats.arc_reclaim_waiters_sig: 0
kstat.zfs.misc.arcstats.arc_reclaim_waiters_bcst: 244208
kstat.zfs.misc.arcstats.arc_reclaim_waiters_tout: 0
lundman commented 4 years ago

Just pulling out the zfs calls, and sorting on frequency:

  31 lz4_decompress_zfs
  31 zap_lockdir
  32 dmu_read_uio_dnode
  35 dmu_buf_hold
  35 zfs_vnop_read
  36 zfs_read
  43 -
  48 zio_done
  55 __zio_execute
  61 vdev_mirror_io_start
  65 dbuf_hold
  68 dbuf_hold_impl
  73 __dbuf_hold_impl
  73 vdev_disk_io_start
  80 buf_strategy_iokit
  85 arc_read
 121 dmu_read_impl
 127 dmu_read
 135 dbuf_read
 135 zio_vdev_io_start
 141 zio_nowait
 184 zfs_vnop_pagein
 209 dmu_buf_hold_array_by_dnode
 217 zio_wait
lundman commented 4 years ago

Note to self,

module/zfs/spa_config.c
spa_write_cachefile() 

will panic during boot, due to vn_open() being called before rootvnode has been set.

lundman commented 4 years ago

Ok, the boot issues have been fixed, except for the performance one. There is a signed PKG on wiki if anyone wants to try ZFS on Boot.