openzfsonosx / zfs

OpenZFS on OS X
https://openzfsonosx.org/
Other
824 stars 72 forks source link

Kernel Panic (Time Out) - OpenZFS 2.1.0 on Apple Silicon #797

Closed jawbroken closed 1 year ago

jawbroken commented 2 years ago

Previously reported on the forums here and here, but I couldn't find a bug tracking it. I'm recently seeing fairly regular kernel panics with similar backtraces (generally the result of Plex Media Scanner or Plex Transcoder).

Apologies again for lack of symbols, I haven't had any luck getting that to work following the varying instructions across the internet (but you can see one with symbols at the first link above).

panic(cpu 1 caller 0xfffffe00186efaa0): timed out waiting for child callback, inchild: 1
Debugger message: panic
Memory ID: 0x6
OS release type: User
OS version: 21F79
Kernel version: Darwin Kernel Version 21.5.0: Tue Apr 26 21:08:37 PDT 2022; root:xnu-8020.121.3~4/RELEASE_ARM64_T6000
Fileset Kernelcache UUID: 77B58D4501F17D9FA036CFE982C6B773
Kernel UUID: C44613B0-01A6-3609-A18D-29AC6CE3DAAF
iBoot version: iBoot-7459.121.3
secure boot?: YES
Paniclog version: 13
KernelCache slide: 0x0000000011868000
KernelCache base:  0xfffffe001886c000
Kernel slide:      0x0000000012024000
Kernel text base:  0xfffffe0019028000
Kernel text exec slide: 0x000000001210c000
Kernel text exec base:  0xfffffe0019110000
mach_absolute_time: 0xbe685ed852a
Epoch Time:        sec       usec
  Boot    : 0x62a32dfe 0x0007e4f7
  Sleep   : 0x00000000 0x00000000
  Wake    : 0x00000000 0x00000000
  Calendar: 0x62ab7fa0 0x000ed912

Zone info:
  Zone map: 0xfffffe1020a4c000 - 0xfffffe3020a4c000
  . VM    : 0xfffffe1020a4c000 - 0xfffffe14ed718000
  . RO    : 0xfffffe14ed718000 - 0xfffffe16870b0000
  . GEN0  : 0xfffffe16870b0000 - 0xfffffe1b53d7c000
  . GEN1  : 0xfffffe1b53d7c000 - 0xfffffe2020a48000
  . GEN2  : 0xfffffe2020a48000 - 0xfffffe24ed714000
  . GEN3  : 0xfffffe24ed714000 - 0xfffffe29ba3e0000
  . DATA  : 0xfffffe29ba3e0000 - 0xfffffe3020a4c000
  Metadata: 0xfffffe7031304000 - 0xfffffe7039304000
  Bitmaps : 0xfffffe7039304000 - 0xfffffe7059304000

CORE 0 PVH locks held: None
CORE 1 PVH locks held: None
CORE 2 PVH locks held: None
CORE 3 PVH locks held: None
CORE 4 PVH locks held: None
CORE 5 PVH locks held: None
CORE 6 PVH locks held: None
CORE 7 PVH locks held: None
CORE 8 PVH locks held: None
CORE 9 PVH locks held: None
CORE 10 PVH locks held: None
CORE 11 PVH locks held: None
CORE 12 PVH locks held: None
CORE 13 PVH locks held: None
CORE 14 PVH locks held: None
CORE 15 PVH locks held: None
CORE 16 PVH locks held: None
CORE 17 PVH locks held: None
CORE 18 PVH locks held: None
CORE 19 PVH locks held: None
CORE 0: PC=0xfffffe0019d21cb8, LR=0xfffffe0019d21c98, FP=0xfffffe605ee43ca0
CORE 1 is the one that panicked. Check the full backtrace for details.
CORE 2: PC=0xfffffe001919eb14, LR=0xfffffe001919eb10, FP=0xfffffe60607a3f00
CORE 3: PC=0xfffffe001919eb10, LR=0xfffffe001919eb10, FP=0xfffffe6060443f00
CORE 4: PC=0xfffffe001919eb14, LR=0xfffffe001919eb10, FP=0xfffffe6062bbbf00
CORE 5: PC=0xfffffe001919eb10, LR=0xfffffe001919eb10, FP=0xfffffe6060503f00
CORE 6: PC=0xfffffe001919eb10, LR=0xfffffe001919eb10, FP=0xfffffe6062a2bf00
CORE 7: PC=0xfffffe001919eb14, LR=0xfffffe001919eb10, FP=0xfffffe605fc63f00
CORE 8: PC=0xfffffe001919eb10, LR=0xfffffe001919eb10, FP=0xfffffe605eb73f00
CORE 9: PC=0xfffffe001919eb10, LR=0xfffffe001919eb10, FP=0xfffffe6062a4bf00
CORE 10: PC=0xfffffe001919eb10, LR=0xfffffe001919eb10, FP=0xfffffe605fbc3f00
CORE 11: PC=0xfffffe001919eb10, LR=0xfffffe001919eb10, FP=0xfffffe605a4c3f00
CORE 12: PC=0xfffffe001919eb10, LR=0xfffffe001919eb10, FP=0xfffffe6062ddbf00
CORE 13: PC=0xfffffe001919eb10, LR=0xfffffe001919eb10, FP=0xfffffe605c23bf00
CORE 14: PC=0xfffffe001919eb10, LR=0xfffffe001919eb10, FP=0xfffffe605aec3f00
CORE 15: PC=0xfffffe001919eb10, LR=0xfffffe001919eb10, FP=0xfffffe6062dcbf00
CORE 16: PC=0xfffffe001919eb10, LR=0xfffffe001919eb10, FP=0xfffffe605c493f00
CORE 17: PC=0xfffffe00192a2be8, LR=0xfffffe00192a2be4, FP=0xfffffe605abe3e90
CORE 18: PC=0xfffffe001919eb10, LR=0xfffffe001919eb10, FP=0xfffffe605eef3f00
CORE 19: PC=0xfffffe001919eb10, LR=0xfffffe001919eb10, FP=0xfffffe605fb73f00
Compressor Info: 0% of compressed pages limit (OK) and 0% of segments limit (OK) with 0 swapfiles and OK swap space
Panicked task 0xfffffe1b54a80728: 13953 pages, 7 threads: pid 81342: Plex Media Scann
Panicked thread: 0xfffffe24ef5ef1c0, backtrace: 0xfffffe605c9317b0, tid: 7683453
          lr: 0xfffffe0019169124  fp: 0xfffffe605c931820
          lr: 0xfffffe0019168dec  fp: 0xfffffe605c931890
          lr: 0xfffffe00192adf2c  fp: 0xfffffe605c9318b0
          lr: 0xfffffe001929fd00  fp: 0xfffffe605c931920
          lr: 0xfffffe001929d9ac  fp: 0xfffffe605c9319e0
          lr: 0xfffffe00191177f8  fp: 0xfffffe605c9319f0
          lr: 0xfffffe0019168a70  fp: 0xfffffe605c931d90
          lr: 0xfffffe0019168a70  fp: 0xfffffe605c931e00
          lr: 0xfffffe001998f120  fp: 0xfffffe605c931e20
          lr: 0xfffffe00186efaa0  fp: 0xfffffe605c931ef0
          lr: 0xfffffe00186ef4d8  fp: 0xfffffe605c931f30
          lr: 0xfffffe00186f3c84  fp: 0xfffffe605c932220
          lr: 0xfffffe00186ee1ac  fp: 0xfffffe605c932490
          lr: 0xfffffe00186ef754  fp: 0xfffffe605c9324e0
          lr: 0xfffffe00186ef4f0  fp: 0xfffffe605c932520
          lr: 0xfffffe00186ee1ac  fp: 0xfffffe605c932790
          lr: 0xfffffe00186ef754  fp: 0xfffffe605c9327e0
          lr: 0xfffffe00186ef4f0  fp: 0xfffffe605c932820
          lr: 0xfffffe00186ee1ac  fp: 0xfffffe605c932a90
          lr: 0xfffffe00186ef754  fp: 0xfffffe605c932ae0
          lr: 0xfffffe00186ef4f0  fp: 0xfffffe605c932b20
          lr: 0xfffffe00186ee1ac  fp: 0xfffffe605c932d90
          lr: 0xfffffe00186ef754  fp: 0xfffffe605c932de0
          lr: 0xfffffe00186ef4f0  fp: 0xfffffe605c932e20
          lr: 0xfffffe00186de00c  fp: 0xfffffe605c932ec0
          lr: 0xfffffe00186d6f68  fp: 0xfffffe605c932f00
          lr: 0xfffffe00186d67dc  fp: 0xfffffe605c932fc0
          lr: 0xfffffe00185b1b54  fp: 0xfffffe605c933000
          lr: 0xfffffe0018426ef0  fp: 0xfffffe605c933040
          lr: 0xfffffe001841b160  fp: 0xfffffe605c933100
          lr: 0xfffffe001841d4f4  fp: 0xfffffe605c933610
          lr: 0xfffffe001843baf0  fp: 0xfffffe605c933730
          lr: 0xfffffe001843b0e0  fp: 0xfffffe605c933880
          lr: 0xfffffe0018449ef0  fp: 0xfffffe605c933990
          lr: 0xfffffe001844b3c8  fp: 0xfffffe605c933a10
          lr: 0xfffffe001844b59c  fp: 0xfffffe605c933a60
          lr: 0xfffffe0018589264  fp: 0xfffffe605c933b40
          lr: 0xfffffe0018595c64  fp: 0xfffffe605c933bd0
          lr: 0xfffffe00193fe910  fp: 0xfffffe605c933c60
          lr: 0xfffffe00193fe668  fp: 0xfffffe605c933cc0
          lr: 0xfffffe00196e1838  fp: 0xfffffe605c933d50
          lr: 0xfffffe00196e1528  fp: 0xfffffe605c933db0
          lr: 0xfffffe00197d00a0  fp: 0xfffffe605c933e50
          lr: 0xfffffe001929da80  fp: 0xfffffe605c933f10
          lr: 0xfffffe00191177f8  fp: 0xfffffe605c933f20
      Kernel Extensions in backtrace:
         org.openzfsonosx.zfs(2.1)[BE4DF1D3-FF77-3E58-BC9A-C0B8E175DD97]@0xfffffe0018414000->0xfffffe00186f8a8b
            dependency: com.apple.iokit.IOStorageFamily(2.1)[2912B6A9-2D4A-35E7-8280-2EDE64A64E87]@0xfffffe001b53ae70->0xfffffe001b55ba23
panic(cpu 1 caller 0xfffffe00206efaa0): timed out waiting for child callback, inchild: 1
Debugger message: panic
Memory ID: 0x6
OS release type: User
OS version: 21F79
Kernel version: Darwin Kernel Version 21.5.0: Tue Apr 26 21:08:37 PDT 2022; root:xnu-8020.121.3~4/RELEASE_ARM64_T6000
Fileset Kernelcache UUID: 77B58D4501F17D9FA036CFE982C6B773
Kernel UUID: C44613B0-01A6-3609-A18D-29AC6CE3DAAF
iBoot version: iBoot-7459.121.3
secure boot?: YES
Paniclog version: 13
KernelCache slide: 0x0000000019868000
KernelCache base:  0xfffffe002086c000
Kernel slide:      0x000000001a024000
Kernel text base:  0xfffffe0021028000
Kernel text exec slide: 0x000000001a10c000
Kernel text exec base:  0xfffffe0021110000
mach_absolute_time: 0x19791a30241
Epoch Time:        sec       usec
  Boot    : 0x62addeab 0x000453e9
  Sleep   : 0x00000000 0x00000000
  Wake    : 0x00000000 0x00000000
  Calendar: 0x62aefb8e 0x00031526

Zone info:
  Zone map: 0xfffffe10000dc000 - 0xfffffe30000dc000
  . VM    : 0xfffffe10000dc000 - 0xfffffe14ccda8000
  . RO    : 0xfffffe14ccda8000 - 0xfffffe1666740000
  . GEN0  : 0xfffffe1666740000 - 0xfffffe1b3340c000
  . GEN1  : 0xfffffe1b3340c000 - 0xfffffe20000d8000
  . GEN2  : 0xfffffe20000d8000 - 0xfffffe24ccda4000
  . GEN3  : 0xfffffe24ccda4000 - 0xfffffe2999a70000
  . DATA  : 0xfffffe2999a70000 - 0xfffffe30000dc000
  Metadata: 0xfffffe703195c000 - 0xfffffe703995c000
  Bitmaps : 0xfffffe703995c000 - 0xfffffe705995c000

CORE 0 PVH locks held: None
CORE 1 PVH locks held: None
CORE 2 PVH locks held: None
CORE 3 PVH locks held: None
CORE 4 PVH locks held: None
CORE 5 PVH locks held: None
CORE 6 PVH locks held: None
CORE 7 PVH locks held: None
CORE 8 PVH locks held: None
CORE 9 PVH locks held: None
CORE 10 PVH locks held: None
CORE 11 PVH locks held: None
CORE 12 PVH locks held: None
CORE 13 PVH locks held: None
CORE 14 PVH locks held: None
CORE 15 PVH locks held: None
CORE 16 PVH locks held: None
CORE 17 PVH locks held: None
CORE 18 PVH locks held: None
CORE 19 PVH locks held: None
CORE 0: PC=0xfffffe002170d5fc, LR=0xfffffe0021514a00, FP=0xfffffe605d0bb800
CORE 1 is the one that panicked. Check the full backtrace for details.
CORE 2: PC=0xfffffe002119eb14, LR=0xfffffe002119eb10, FP=0xfffffe605e6abf00
CORE 3: PC=0xfffffe002119eb10, LR=0xfffffe002119eb10, FP=0xfffffe605b19bf00
CORE 4: PC=0xfffffe002119eb10, LR=0xfffffe002119eb10, FP=0xfffffe605b95bf00
CORE 5: PC=0xfffffe002119eb10, LR=0xfffffe002119eb10, FP=0xfffffe6059313f00
CORE 6: PC=0xfffffe002119eb10, LR=0xfffffe002119eb10, FP=0xfffffe60611e3f00
CORE 7: PC=0xfffffe002119eb10, LR=0xfffffe002119eb10, FP=0xfffffe6058633f00
CORE 8: PC=0xfffffe002119eb10, LR=0xfffffe002119eb10, FP=0xfffffe605d423f00
CORE 9: PC=0xfffffe002119eb10, LR=0xfffffe002119eb10, FP=0xfffffe605d37bf00
CORE 10: PC=0xfffffe002119eb10, LR=0xfffffe002119eb10, FP=0xfffffe6060993f00
CORE 11: PC=0xfffffe002119eb10, LR=0xfffffe002119eb10, FP=0xfffffe605d3b3f00
CORE 12: PC=0xfffffe002119eb10, LR=0xfffffe002119eb10, FP=0xfffffe605e51bf00
CORE 13: PC=0xfffffe002119eb10, LR=0xfffffe002119eb10, FP=0xfffffe6060ea3f00
CORE 14: PC=0xfffffe002119eb10, LR=0xfffffe002119eb10, FP=0xfffffe605d3c3f00
CORE 15: PC=0xfffffe002119eb10, LR=0xfffffe002119eb10, FP=0xfffffe605e22bf00
CORE 16: PC=0xfffffe002119eb10, LR=0xfffffe002119eb10, FP=0xfffffe60609b3f00
CORE 17: PC=0xfffffe002119eb10, LR=0xfffffe002119eb10, FP=0xfffffe6060e53f00
CORE 18: PC=0xfffffe002119eb10, LR=0xfffffe002119eb10, FP=0xfffffe6059333f00
CORE 19: PC=0xfffffe002119eb10, LR=0xfffffe002119eb10, FP=0xfffffe605da53f00
Compressor Info: 0% of compressed pages limit (OK) and 0% of segments limit (OK) with 0 swapfiles and OK swap space
Panicked task 0xfffffe166abb2df8: 11473 pages, 23 threads: pid 87221: Plex Transcoder
Panicked thread: 0xfffffe16686f38e0, backtrace: 0xfffffe605a4397b0, tid: 1117848
          lr: 0xfffffe0021169124  fp: 0xfffffe605a439820
          lr: 0xfffffe0021168dec  fp: 0xfffffe605a439890
          lr: 0xfffffe00212adf2c  fp: 0xfffffe605a4398b0
          lr: 0xfffffe002129fd00  fp: 0xfffffe605a439920
          lr: 0xfffffe002129d9ac  fp: 0xfffffe605a4399e0
          lr: 0xfffffe00211177f8  fp: 0xfffffe605a4399f0
          lr: 0xfffffe0021168a70  fp: 0xfffffe605a439d90
          lr: 0xfffffe0021168a70  fp: 0xfffffe605a439e00
          lr: 0xfffffe002198f120  fp: 0xfffffe605a439e20
          lr: 0xfffffe00206efaa0  fp: 0xfffffe605a439ef0
          lr: 0xfffffe00206ef4d8  fp: 0xfffffe605a439f30
          lr: 0xfffffe00206f3c84  fp: 0xfffffe605a43a220
          lr: 0xfffffe00206ee1ac  fp: 0xfffffe605a43a490
          lr: 0xfffffe00206ef754  fp: 0xfffffe605a43a4e0
          lr: 0xfffffe00206ef4f0  fp: 0xfffffe605a43a520
          lr: 0xfffffe00206ee1ac  fp: 0xfffffe605a43a790
          lr: 0xfffffe00206ef754  fp: 0xfffffe605a43a7e0
          lr: 0xfffffe00206ef4f0  fp: 0xfffffe605a43a820
          lr: 0xfffffe00206ee1ac  fp: 0xfffffe605a43aa90
          lr: 0xfffffe00206ef754  fp: 0xfffffe605a43aae0
          lr: 0xfffffe00206ef4f0  fp: 0xfffffe605a43ab20
          lr: 0xfffffe00206ee1ac  fp: 0xfffffe605a43ad90
          lr: 0xfffffe00206ef754  fp: 0xfffffe605a43ade0
          lr: 0xfffffe00206ef4f0  fp: 0xfffffe605a43ae20
          lr: 0xfffffe00206de00c  fp: 0xfffffe605a43aec0
          lr: 0xfffffe00206d6f68  fp: 0xfffffe605a43af00
          lr: 0xfffffe00206d67dc  fp: 0xfffffe605a43afc0
          lr: 0xfffffe00205b1b54  fp: 0xfffffe605a43b000
          lr: 0xfffffe0020426ef0  fp: 0xfffffe605a43b040
          lr: 0xfffffe002041b160  fp: 0xfffffe605a43b100
          lr: 0xfffffe002041d4f4  fp: 0xfffffe605a43b610
          lr: 0xfffffe002043baf0  fp: 0xfffffe605a43b730
          lr: 0xfffffe002043b0e0  fp: 0xfffffe605a43b880
          lr: 0xfffffe0020449ef0  fp: 0xfffffe605a43b990
          lr: 0xfffffe002044b3c8  fp: 0xfffffe605a43ba10
          lr: 0xfffffe002044b59c  fp: 0xfffffe605a43ba60
          lr: 0xfffffe0020589264  fp: 0xfffffe605a43bb40
          lr: 0xfffffe0020595c64  fp: 0xfffffe605a43bbd0
          lr: 0xfffffe00213fe910  fp: 0xfffffe605a43bc60
          lr: 0xfffffe00213fe668  fp: 0xfffffe605a43bcc0
          lr: 0xfffffe00216e1838  fp: 0xfffffe605a43bd50
          lr: 0xfffffe00216e1528  fp: 0xfffffe605a43bdb0
          lr: 0xfffffe00217d00a0  fp: 0xfffffe605a43be50
          lr: 0xfffffe002129da80  fp: 0xfffffe605a43bf10
          lr: 0xfffffe00211177f8  fp: 0xfffffe605a43bf20
      Kernel Extensions in backtrace:
         org.openzfsonosx.zfs(2.1)[BE4DF1D3-FF77-3E58-BC9A-C0B8E175DD97]@0xfffffe0020414000->0xfffffe00206f8a8b
            dependency: com.apple.iokit.IOStorageFamily(2.1)[2912B6A9-2D4A-35E7-8280-2EDE64A64E87]@0xfffffe002353ae70->0xfffffe002355ba23

Happy to help debug in any way, thanks.

lundman commented 2 years ago

I believe we need to double the timeout on this case, and make it into a tunable, for the cases where more is needed. I can produce a build for testing in a bit.

jawbroken commented 2 years ago

Thanks, I really appreciate it. A curious datapoint: this started happening for me fairly regularly when I had to have my ZFS fileserver on wifi for a while (often when doing large file transfers, I think). I would also end up with empty files, zero byte sometimes (randomly) when files were moved over SMB and have to redo the copy. Now that I've put it back on wired ethernet I haven't seen the same kind of issues yet. It doesn't make much sense to me, but I just thought I would note it.

matthewwo commented 2 years ago

I experienced something similar where Spotlight indexing, anything that's scanning disks (DaisyDisk), or heavy IO load (Xcode) seems to freeze the system and eventually trigger a panic.

Please let me know if there's anything I can do to help debug this!

jawbroken commented 2 years ago

Yes, I've disabled Spotlight on the ZFS drives for this reason. I had a previous bug on this but I couldn't work out if OpenZFS was actually at fault there so I closed it. The kernel panics there were different, and the backtraces didn't mention OpenZFS in the "Kernel Extensions in backtrace" section, but I suspected it was involved somehow.

jawbroken commented 2 years ago

Though I should note that my ZFS drives don't stay in the Spotlight Privacy part of System Preferences for some reason, so I disabled it by making a .metadata_never_index file at root level.

matthewwo commented 2 years ago

@jawbroken thanks for the tip! I've disabled spotlight indexing entirely by unloading the daemon:

sudo launchctl unload -w /System/Library/LaunchDaemons/com.apple.metadata.mds.plist

Adding a file to the directory seems a better idea!

lundman commented 2 years ago

https://github.com/openzfsonosx/openzfs/commit/4bab75549033f5d96fc0445317b422a70a7ca3a1

https://openzfsonosx.org/forum/viewtopic.php?f=20&t=3677&p=11754#p11754

matthewwo commented 2 years ago

Hi @lundman thanks for giving an update to this issue! Wondering will the pkg work for M1? Many thanks again!

matthewwo commented 2 years ago

The installer installed it successfully, but after rebooting and ran zfs version showed kext not loaded. When I load it manually it gives me an error that the kext is not built for arm64e.

CleanShot 2022-07-03 at 16 08 48

lundman commented 2 years ago

I can make you an arm64 build

matthewwo commented 2 years ago

amazing @lundman! Thanks so much will test it right away.

matthewwo commented 2 years ago

@lundman thanks for the arm64 build but once I installed it, the kernel panicked and I couldn't manually load it as well.

dmzimmerman commented 2 years ago

Not that it's surprising given the common element of "scanning drives or folders", but I've found that Backblaze's backup scanning may also trigger panics like these on my new M1 Ultra; I was having panics approximately every hour (whether I was sitting at the machine or not) until disabling Backblaze for now, and the machine has happily stayed running overnight.

jawbroken commented 2 years ago

After a long period of stability after putting the server back on ethernet and disabling Spotlight indexing, I'm now getting this kernel panic repeatedly now while trying to move a large amount of data from one pool to another.

jawbroken commented 2 years ago

Now that I can reproduce this fairly easily (though it takes a random amount of time) by just copying a large directory from one pool to another in the Finder, I was able to catch one happening live and have some observations. I came back to the computer and the transfer had stopped making any progress and I could see in Activity Monitor and zpool iostat that there was no activity on either pool. This persisted for about 5 minutes at which time I tried opening a folder on one of the pool and it finally locked up and restarted. So I'm not sure that this is just a matter of increasing a timeout somewhere, because it seemed genuinely deadlocked somewhere (and perhaps the child task timeout was more of a symptom?). I'm going to try some of the tuning for stack pages mentioned in the other thread next, I guess.

lundman commented 2 years ago

Just an update. I've used VirtualBuddy to setup a Ventura VM, and VB lets me disable SIP and enable 3rd party KEXTs. I believe that last good installer still works.

However, new compiles of the KEXT just doesn't load, it starts to boot (after Allow) then restarts the progress bar, boots and points out something went wrong, and the KEXT needs Approving again. Alas, there is no log entries, our outputs to hint at what could be wrong. There is no -v boot option that I can find. There is no indication that there is anything wrong with the kext before Approval, codesign ok, kextlibs ok etc. Just needs Approve. This is rather frustrating, especially since this is the 3rd major OS release with exceptionally poor support in this area. Possibly the next step would be to start from an empty sample kext, and add things to it one by one.

jawbroken commented 2 years ago

Sorry for the frustration, I appreciate the help. Let me know if there's anything I can do to help test or narrow down issues. I'm trying a copy with cp -pvR instead of the Finder now and it seems okay so far, with similar read/write bandwidth from the two pools.

jawbroken commented 2 years ago

I can definitely get further with cp versus copying in the Finder, but eventually it stops making progress (and then will kernel panic if I try to do something like open up the directory in Finder). When cp stops making progress, I can sample the process and see it is stuck inside a write call in libsystem_kernel.dylib. The cp process is still using 8% CPU and making a few thousand context switches a second. From the troubleshooting steps in the other thread, kstat.spl.misc.spl_misc.active_threads is stable at 514, its usual value. I just tried to launch Instruments to see if I could work anything else out and the computer kernel panicked.

I haven't tried increasing the stack page size yet because it's more difficult than I thought (I need to boot into the recovery partition, which you can't do remotely, and I don't have a monitor so I need to move the headless server and plug it into a TV or something).

jawbroken commented 2 years ago

Managed to complete the copy of 32 TB with rsync, which must have a different pattern of system calls.

matthewwo commented 2 years ago

lundman just released a new beta version Monterey for arm64, perhaps you can give it a shot? https://github.com/openzfsonosx/zfs/issues/798