lgarron / first-world

Issues that are not necessarily easy to report/fix, and are (generally) not a matter of life-or-death.
https://github.com/lgarron/first-world/issues
0 stars 0 forks source link

PROMISE Pegasus32 R8 causes ridiculous system-wide freezes on macOS #100

Open lgarron opened 2 years ago

lgarron commented 2 years ago

(Using the PROMISE Pegasus2 RAID array on macOS 12 requires a kernel extension. Those are deprecated, and for good reason.) EDIT: The PROMISE Pegasus32 RAID array uses a dext. This is better, but I still consider it borderline unacceptable to need a third-party driver for a drive that advertises itself as Thunderbolt 3 and USB 3.2 compatible. I don't want a custom high-touch racecar, I want a reliable station wagon full of tapes.

When I have my RAID array plugged in, some file system operation will freeze for minutes at a time and seemingly halt every program that is accessing any disk on the file system. Even if that program is only accessing files on the internal SSD. So basically everything except the cursor (and some occasional UI elements) freeze up, for minutes at a time. If I'm lucky, it unfreezes in 1 or maybe 10 minutes. Otherwise, I have to pull the cable manually (and sometimes hard reboot the RAID array itself). I also have to guess whether the cause was purely the RAID array or possibly related to something else, like our corporate spyware or other bad macOS behaviour when connected to slow/SMB drives.

Combined with #70, that doesn't inspire a lot of confidence in reliability for getting back my data in the future. (I keep multiple backups, but I still expect a RAID array PROMISE to not be this broken, given how much Apple seems to position them as the premium "just works" option for Apple computers.)

lgarron commented 1 year ago

Re-titling this to the Pegasus32 R8, which still exhibits severe system macOS freezing symptoms. I think I spent 10 minutes today waiting on an operation that should have been nearly instant, among many other freezes that were only a few minutes in length.

At the very least, it looks like me Pegasus32 R8 usually unfreezes after a (long) while now, as opposed to getting stuck indefinitely sometimes.

lgarron commented 1 year ago

This is still a severe issue. I just plugged in my Pegasus32 R8, and the computer spent a full two minutes freezing all access to all disks on the entire system. At that point, the Pegasus32 R8 disks were still not mounted (see https://github.com/lgarron/first-world/issues/147). I unplugged it, and the system immediately unfroze.

While disk access was frozen, I could tell that the computer was still working (the mouse could move and it was possible to activate Quicksilver), but VSCode couldn't accept any keyboard input and various other apps were frozen.

I still don't know if this is the "fault" of the dext, but it sure is a garbage experience.

lgarron commented 1 year ago

This is still a severe issue. I just plugged in my Pegasus32 R8, and the computer spent a full two minutes freezing all access to all disks on the entire system. At that point, the Pegasus32 R8 disks were still not mounted (see #147). I unplugged it, and the system immediately unfroze.

Seeing extreme freezes again. Most of the OS was more functional this time, but it took more than 5 minutes for the Pegasus32 R8 disks to show in Finder or Disk Utility (https://github.com/lgarron/first-world/issues/140).

I have a suspicion that this is due to the PROMISE Utility Pro trying to update the firmware in the background, which... I don't want it to. I want reliability from my RAID array, not flaky magic. But okay, I'm going to try to see if I can fix https://github.com/lgarron/first-world/issues/159 to the point that it will at least update somehow.

EDIT: based on other behaviour today, I suspect this was merely an instance of https://github.com/lgarron/first-world/issues/147 , not an instance of the dext freezing the system.

lgarron commented 1 year ago

This is happening again. Computer's running a routine backup operation (which I'm doing to save data from potential catastrophic loss due to https://github.com/lgarron/first-world/issues/161) and file operations the whole system have been frozen for about 15 minutes. I probably need to unplug my Pegasus32 R8, but that just risks more data loss. I really wish I had an alternative I could switch to, because this is kind of unacceptable.

Edit: Hard rebooted the computer using the power button. The Pegasus32 R8 is showing up in PROMISE Utility Pro.app, but the disks aren't😕 connecting since rebooting.

fabriziorizzo commented 10 months ago

Almost the same here... Mac Pro Late 2013, Pegasus2 R8 Rev B3 with Promise Utility v 4.06.0000.04 (Mar 28, 2022) on OS 12.7.1. My RAID6 volume mounts, but all file access is effectively stalled. I can see the filesystem in Finder and navigate directories, but the second I try to open or create a file, it jams up.

running: sudo fs_usage -f filesys I see my Adobe Lightroom Classic app trying to open my library and doing nothing but the following for hours...

22:13:29.948708 RdMeta[S] D=0x0001dcc8 B=0x1000 /dev/disk3 /dev/disk3 0.008224 W Adobe Lightroom .49590 22:13:29.949049 RdMeta[S] D=0x000dd859 B=0x1000 /dev/disk3 /dev/disk3 0.000306 W Adobe Lightroom .49590 22:13:29.952965 RdMeta[S] D=0x000e4e30 B=0x1000 /dev/disk3 /dev/disk3 0.003901 W Adobe Lightroom .49590 22:13:29.953272 RdMeta[S] D=0x000dd85b B=0x1000 /dev/disk3 /dev/disk3 0.000274 W Adobe Lightroom .49590 22:13:29.953567 RdMeta[S] D=0x000dd85c B=0x1000 /dev/disk3 /dev/disk3 0.000266 W Adobe Lightroom .49590 22:13:29.953857 RdMeta[S] D=0x000dfebb B=0x1000 /dev/disk3 /dev/disk3 0.000275 W Adobe Lightroom .49590 22:13:29.954149 RdMeta[S] D=0x000dd85e B=0x1000 /dev/disk3 /dev/disk3 0.000276 W Adobe Lightroom .49590 22:13:29.954675 RdMeta[S] D=0x000e3fd2 B=0x1000 /dev/disk3 /dev/disk3 0.000512 W Adobe Lightroom .49590 22:13:29.954966 RdMeta[S] D=0x000dd860 B=0x1000 /dev/disk3 /dev/disk3 0.000275 W Adobe Lightroom .49590 22:13:29.955236 RdMeta[S] D=0x000dfebd B=0x1000 /dev/disk3 /dev/disk3 0.000255 W Adobe Lightroom .49590 22:13:29.955528 RdMeta[S] D=0x000dd862 B=0x1000 /dev/disk3 /dev/disk3 0.000278 W Adobe Lightroom .49590 22:13:29.956094 RdMeta[S] D=0x000e6a65 B=0x1000 /dev/disk3 /dev/disk3 0.000551 W Adobe Lightroom .49590 22:13:29.956382 RdMeta[S] D=0x000dd864 B=0x1000 /dev/disk3 /dev/disk3 0.000270 W Adobe Lightroom .49590 22:13:29.956651 RdMeta[S] D=0x000dfebf B=0x1000 /dev/disk3 /dev/disk3 0.000255 W Adobe Lightroom .49590 22:13:29.956822 RdMeta[S] D=0x000dd866 B=0x1000 /dev/disk3 /dev/disk3 0.000156 W Adobe Lightroom .49590 22:13:29.957331 RdMeta[S] D=0x000e3fd4 B=0x1000 /dev/disk3 /dev/disk3 0.000495 W Adobe Lightroom .49590

Attempting to run BlackMagic Disk Speed Test against that volume hangs, then eventually ends with a Write speed of 0.2 MB/s and doesn't attempt the read test.

Without the array connected, I can get 900-1200 MB/s off the boot SSD. (not bad for a 10-year old system...), though finder hangs while trying to enumerate the available volumes (figuring problems reading the array).

Promise Utility works, slowly... and I mean much more slowly than the usual "slow"... The dashboard is all-green, no errors in any of the event info or disk info. The system Service Report generates properly, and no errors are evident. The 8x 8TB drives show between 23000-36000 power-on hours and all the SMART metrics are either of type old_age or pre_fail. None indicate faults of any sort.

Unfortunately, iotop won't work with SIP enabled, and disabling, then reenabling while excluding dtrace isn't enough for iotop to work properly either... so stuck with just the log of fs_usage output.

How to trace if the issue is the kernel extension? I'm tempted to try a new TB2 cable and a different TB controller/port.

lgarron commented 1 month ago

Still happening. The entire system comes to a crawl. Some UI still works, but many apps are frozen and near-frozen due to disk access.  → "Restart" and sudo reboot now almost work, but they take minutes and minutes to offer to force quit apps due to being frozen. After long enough of this nonsense, unplugging the Pegasus32 Thunderbolt cable immediately causes macOS to resume normal function.

I blame PROMISE's dext, but who knows.