osy / HaC-Mini

Intel NUC Hades Canyon Hackintosh support
MIT License
1.22k stars 159 forks source link

USB ports reset during sleep/wakeup; USB drives ejected #8

Open osy opened 5 years ago

osy commented 5 years ago

When a device is plugged in and the computer goes to sleep and wakes up, the device gets reset and the port is re-enumerated. If the device is a USB mass storage drive, then the user sees a "unsafe removal" message.

The behaviour is different for the PCH XHCI ports versus the PEG XHCI ports. For PEG, this happens only during the first sleep. Then, everything is normal. However, for PCH ports (and TB3 USB ports) the reset always happens.

The culprit appears to be a hardware or BIOS issue. The PCI PM_Enable bit should be retained across D3 sleeps, however by the time XNU is woken up, the PM_Enable state is lost. This could be a hardware bug or it could be the resume path in the UEFI firmware messing with the USB device. Evidence for the latter is that a USB keyboard that is plugged in has the LED briefly show up during wake, then goes off, and later comes back on again.

In addition to the PM_Enable state lost, it appears that all other state is lost as well, which is why the ports are reset on resume. A proper fix would require a patched BIOS. A workaround would be to silently re-enumerate the device without user notification (Linux does this). The way the IOUSBMassStorage driver works makes this solution difficult to implement because the USB device is a child of the port that is reset.

Marking as won't fix since BIOS patching is beyond the scope of this project.

Workaround: for devices that require preserving state across sleep/resumes, use only the PEG slots (the blue port on the front or the USB-C port on the front). After reset from the first sleep, the device should sleep/wake without issue.

osy commented 5 years ago

Another symptom: with the Apple wifi card installed, WoL on wifi works the first time you sleep. However, it stops working after a darkwake and must be fully woken up and slept again before it works. I think this is because the Wifi card is disconnected on darkwake and isn't properly re-initialized.

osy commented 5 years ago

Here is a resume log from ubuntu. Notice how the USB wakeup errors and the thunderbolt disconnect errors are seen in ubuntu as well. However, in ubuntu, workarounds exist to make the error handling transparent to the user.

[   37.985055] PM: suspend entry (deep)
[   37.985057] PM: Syncing filesystems ... done.
[   37.986712] Freezing user space processes ... (elapsed 0.001 seconds) done.
[   37.987931] OOM killer disabled.
[   37.987931] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done.
[   37.989094] printk: Suspending console(s) (use no_console_suspend to debug)
[   38.111940] sd 0:0:0:0: [sda] Synchronizing SCSI cache
[   38.114771] sd 0:0:0:0: [sda] Stopping disk
[   38.123334] e1000e: EEE TX LPI TIMER: 00000011
[   40.640915] ACPI: EC: interrupt blocked
[   40.688455] ACPI: Preparing to enter system sleep state S3
[   40.689393] ACPI: EC: event blocked
[   40.689393] ACPI: EC: EC stopped
[   40.689394] PM: Saving platform NVS memory
[   40.689400] Disabling non-boot CPUs ...
[   40.691507] smpboot: CPU 1 is now offline
[   40.692246] IRQ 139: no longer affine to CPU2
[   40.693281] smpboot: CPU 2 is now offline
[   40.694946] smpboot: CPU 3 is now offline
[   40.697638] smpboot: CPU 4 is now offline
[   40.698288] IRQ 122: no longer affine to CPU5
[   40.698291] IRQ 124: no longer affine to CPU5
[   40.698294] IRQ 125: no longer affine to CPU5
[   40.699321] smpboot: CPU 5 is now offline
[   40.699869] NOHZ: local_softirq_pending 202
[   40.699876] NOHZ: local_softirq_pending 202
[   40.699910] IRQ 126: no longer affine to CPU6
[   40.699913] IRQ 127: no longer affine to CPU6
[   40.699920] IRQ 144: no longer affine to CPU6
[   40.700931] smpboot: CPU 6 is now offline
[   40.701536] IRQ 123: no longer affine to CPU7
[   40.701544] IRQ 130: no longer affine to CPU7
[   40.701547] IRQ 132: no longer affine to CPU7
[   40.702581] smpboot: CPU 7 is now offline
[   40.705525] ACPI: Low-level resume complete
[   40.705630] ACPI: EC: EC started
[   40.705630] PM: Restoring platform NVS memory
[   40.707634] Enabling non-boot CPUs ...
[   40.707666] x86: Booting SMP configuration:
[   40.707666] smpboot: Booting Node 0 Processor 1 APIC 0x2
[   40.709244] unchecked MSR access error: WRMSR to 0x10f (tried to write 0x0000000000000000) at rIP: 0xffffffff86473f98 (native_write_msr+0x8/0x30)
[   40.709246] Call Trace:
[   40.709250]  intel_pmu_cpu_starting+0x87/0x260
[   40.709252]  ? mce_intel_feature_init+0x42/0x160
[   40.709253]  ? x86_pmu_dead_cpu+0x30/0x30
[   40.709255]  x86_pmu_starting_cpu+0x1a/0x30
[   40.709257]  cpuhp_invoke_callback+0x8a/0x580
[   40.709258]  ? cpumask_next+0x1b/0x20
[   40.709260]  ? topology_phys_to_logical_pkg+0x29/0x70
[   40.709261]  notify_cpu_starting+0x5d/0x80
[   40.709262]  start_secondary+0xed/0x200
[   40.709264]  secondary_startup_64+0xa4/0xb0
[   40.709382]  cache: parent cpu1 should not be sleeping
[   40.709411] intel_pstate: Disabling energy efficiency optimization
[   40.709466] microcode: sig=0x906e9, pf=0x8, revision=0x8e
[   40.710521] microcode: updated to revision 0xb4, date = 2019-04-01
[   40.710583] CPU1 is up
[   40.710602] smpboot: Booting Node 0 Processor 2 APIC 0x4
[   40.711014]  cache: parent cpu2 should not be sleeping
[   40.712218] CPU2 is up
[   40.712236] smpboot: Booting Node 0 Processor 3 APIC 0x6
[   40.712647]  cache: parent cpu3 should not be sleeping
[   40.713845] CPU3 is up
[   40.713861] smpboot: Booting Node 0 Processor 4 APIC 0x1
[   40.714355]  cache: parent cpu4 should not be sleeping
[   40.714416] microcode: sig=0x906e9, pf=0x8, revision=0xb4
[   40.714609] CPU4 is up
[   40.714627] smpboot: Booting Node 0 Processor 5 APIC 0x3
[   40.715037]  cache: parent cpu5 should not be sleeping
[   40.715170] CPU5 is up
[   40.715270] smpboot: Booting Node 0 Processor 6 APIC 0x5
[   40.715679]  cache: parent cpu6 should not be sleeping
[   40.715830] CPU6 is up
[   40.715849] smpboot: Booting Node 0 Processor 7 APIC 0x7
[   40.716263]  cache: parent cpu7 should not be sleeping
[   40.716411] CPU7 is up
[   40.722192] ACPI: Waking up from system sleep state S3
[   40.726445] ACPI: EC: interrupt unblocked
[   40.727567] pcieport 0000:07:00.0: Refused to change power state, currently in D3
[   40.731914] pcieport 0000:08:01.0: Refused to change power state, currently in D3
[   40.731915] pcieport 0000:08:02.0: Refused to change power state, currently in D3
[   40.731916] pcieport 0000:08:04.0: Refused to change power state, currently in D3
[   40.755238] pcieport 0000:08:00.0: Refused to change power state, currently in D3
[   40.763274] xhci_hcd 0000:3d:00.0: Refused to change power state, currently in D3
[   40.775209] thunderbolt 0000:09:00.0: Refused to change power state, currently in D3
[   40.899741] ACPI: EC: event unblocked
[   40.900074] usb usb1: root hub lost power or was reset
[   40.900076] usb usb2: root hub lost power or was reset
[   40.900381] usb usb3: root hub lost power or was reset
[   40.900382] usb usb4: root hub lost power or was reset
[   40.903233] sd 0:0:0:0: [sda] Starting disk
[   40.923237] xhci_hcd 0000:3d:00.0: Refused to change power state, currently in D3
[   40.923262] xhci_hcd 0000:3d:00.0: WARN: xHC restore state timeout
[   40.923263] xhci_hcd 0000:3d:00.0: PCI post-resume error -110!
[   40.923263] xhci_hcd 0000:3d:00.0: HC died; cleaning up
[   40.923268] dpm_run_callback(): pci_pm_resume+0x0/0xb0 returns -110
[   40.923272] PM: Device 0000:3d:00.0 failed to resume async: error -110
[   40.927413] [drm] PCIE GART of 256M enabled (table at 0x000000F400000000).
[   41.218186] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[   41.218994] ata1.00: ACPI cmd ef/10:06:00:00:00:00 (SET FEATURES) succeeded
[   41.218995] ata1.00: ACPI cmd f5/00:00:00:00:00:00 (SECURITY FREEZE LOCK) filtered out
[   41.218996] ata1.00: ACPI cmd b1/c1:00:00:00:00:00 (DEVICE CONFIGURATION OVERLAY) filtered out
[   41.219417] ata1.00: supports DRM functions and may not be fully accessible
[   41.223827] ata1.00: ACPI cmd ef/10:06:00:00:00:00 (SET FEATURES) succeeded
[   41.223828] ata1.00: ACPI cmd f5/00:00:00:00:00:00 (SECURITY FREEZE LOCK) filtered out
[   41.223829] ata1.00: ACPI cmd b1/c1:00:00:00:00:00 (DEVICE CONFIGURATION OVERLAY) filtered out
[   41.224221] ata1.00: supports DRM functions and may not be fully accessible
[   41.227494] ata1.00: configured for UDMA/133
[   41.237623] ahci 0000:00:17.0: port does not support device sleep
[   41.237768] ata1.00: Enabling discard_zeroes_data
[   41.243399] usb 1-9: reset full-speed USB device number 3 using xhci_hcd
[   41.300140] usb 4-2: reset SuperSpeed Gen 1 USB device number 2 using xhci_hcd
[   41.519419] usb 1-3: reset high-speed USB device number 2 using xhci_hcd
[   41.619251] [drm] Fence fallback timer expired on ring sdma0
[   41.795532] usb 2-3: reset SuperSpeed Gen 1 USB device number 2 using xhci_hcd
[   41.899403] usb 1-9.3: reset full-speed USB device number 8 using xhci_hcd
[   42.083405] usb 1-3.2: reset full-speed USB device number 4 using xhci_hcd
[   42.131256] [drm] Fence fallback timer expired on ring sdma0
[   42.263343] usb 1-3.3: reset full-speed USB device number 5 using xhci_hcd
[   42.643344] [drm] Fence fallback timer expired on ring sdma0
[   42.685492] [drm] UVD and UVD ENC initialized successfully.
[   42.795479] [drm] VCE initialized successfully.
[   42.807281] OOM killer enabled.
[   42.807281] Restarting tasks ... 
[   42.807586] xhci_hcd 0000:3d:00.0: remove, state 4
[   42.807591] usb usb6: USB disconnect, device number 1
[   42.808156] done.
[   42.809214] xhci_hcd 0000:3d:00.0: USB bus 6 deregistered
[   42.809218] xhci_hcd 0000:3d:00.0: remove, state 4
[   42.809222] usb usb5: USB disconnect, device number 1
[   42.809357] xhci_hcd 0000:3d:00.0: Host halt failed, -19
[   42.809360] xhci_hcd 0000:3d:00.0: Host not accessible, reset failed.
[   42.809459] xhci_hcd 0000:3d:00.0: USB bus 5 deregistered
[   42.860918] atkbd serio0: Unknown key released (translated set 2, code 0x7c on isa0060/serio0).
[   42.860919] atkbd serio0: Use 'setkeycodes 7c <keycode>' to make it known.
[   42.871563] video LNXVIDEO:00: Restoring backlight state
[   42.871666] PM: suspend exit
[   42.891216] usb 1-9.1: new full-speed USB device number 9 using xhci_hcd
[   42.992752] usb 1-9.1: New USB device found, idVendor=05ac, idProduct=820a, bcdDevice= 1.00
[   42.992753] usb 1-9.1: New USB device strings: Mfr=0, Product=0, SerialNumber=0
[   42.995208] input: HID 05ac:820a as /devices/pci0000:00/0000:00:14.0/usb1/1-9/1-9.1/1-9.1:1.0/0003:05AC:820A.0008/input/input41
[   43.007433] e1000e: eno1 NIC Link is Down
[   43.059763] hid-generic 0003:05AC:820A.0008: input,hidraw3: USB HID v1.11 Keyboard [HID 05ac:820a] on usb-0000:00:14.0-9.1/input0
[   43.067327] atkbd serio0: Unknown key released (translated set 2, code 0x7c on isa0060/serio0).
[   43.067328] atkbd serio0: Use 'setkeycodes 7c <keycode>' to make it known.
[   43.147194] usb 1-9.2: new full-speed USB device number 10 using xhci_hcd
[   43.248800] usb 1-9.2: New USB device found, idVendor=05ac, idProduct=820b, bcdDevice= 1.00
[   43.248801] usb 1-9.2: New USB device strings: Mfr=0, Product=0, SerialNumber=0
[   43.251465] input: HID 05ac:820b as /devices/pci0000:00/0000:00:14.0/usb1/1-9/1-9.2/1-9.2:1.0/0003:05AC:820B.0009/input/input42
[   43.268598] atkbd serio0: Unknown key released (translated set 2, code 0x7c on isa0060/serio0).
[   43.268599] atkbd serio0: Use 'setkeycodes 7c <keycode>' to make it known.
[   43.303391] audit: type=1400 audit(1567962438.596:8): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="/snap/core/7270/usr/lib/snapd/snap-confine" pid=3326 comm="apparmor_parser"
[   43.310366] pci_bus 0000:09: busn_res: [bus 09] is released
[   43.310397] pci_bus 0000:0a: busn_res: [bus 0a-3c] is released
[   43.311389] hid-generic 0003:05AC:820B.0009: input,hidraw6: USB HID v1.11 Mouse [HID 05ac:820b] on usb-0000:00:14.0-9.2/input0
[   43.319233] audit: type=1400 audit(1567962438.612:9): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="/snap/core/7270/usr/lib/snapd/snap-confine//mount-namespace-capture-helper" pid=3326 comm="apparmor_parser"
[   43.327329] pci_bus 0000:3d: busn_res: [bus 3d] is released
[   43.327382] pci_bus 0000:3e: busn_res: [bus 3e-71] is released
[   43.327429] pci_bus 0000:08: busn_res: [bus 08-71] is released
[   43.472590] atkbd serio0: Unknown key released (translated set 2, code 0x7c on isa0060/serio0).
[   43.472591] atkbd serio0: Use 'setkeycodes 7c <keycode>' to make it known.
[   43.679072] atkbd serio0: Unknown key released (translated set 2, code 0x7c on isa0060/serio0).
[   43.679073] atkbd serio0: Use 'setkeycodes 7c <keycode>' to make it known.
[   43.883348] atkbd serio0: Unknown key released (translated set 2, code 0x7c on isa0060/serio0).
[   43.883351] atkbd serio0: Use 'setkeycodes 7c <keycode>' to make it known.
[   44.084795] atkbd serio0: Unknown key released (translated set 2, code 0x7c on isa0060/serio0).
[   44.084796] atkbd serio0: Use 'setkeycodes 7c <keycode>' to make it known.
[   44.293871] atkbd serio0: Unknown key released (translated set 2, code 0x7c on isa0060/serio0).
[   44.293872] atkbd serio0: Use 'setkeycodes 7c <keycode>' to make it known.
[   44.496641] atkbd serio0: Unknown key released (translated set 2, code 0x7c on isa0060/serio0).
[   44.496643] atkbd serio0: Use 'setkeycodes 7c <keycode>' to make it known.
[   44.809138] usb 1-9.1: USB disconnect, device number 9
[   44.872005] Bluetooth: hci0: BCM: chip id 73 build 1126
[   44.872988] Bluetooth: hci0: BCM: product 05ac:828f
[   44.873995] Bluetooth: hci0: BCM: features 0x07
[   44.889953] Bluetooth: hci0: ubuntu
[   45.083454] usb 1-9.2: USB disconnect, device number 10
[   45.554940] e1000e: eno1 NIC Link is Up 100 Mbps Full Duplex, Flow Control: Rx/Tx
[   45.554942] e1000e 0000:00:1f.6 eno1: 10/100 speed: disabling TSO
[   45.555012] IPv6: ADDRCONF(NETDEV_CHANGE): eno1: link becomes ready
osy commented 5 years ago

Seems like Macs have HOST_CTRL_PORT_LINK_REG.DIS_LINK_CM set in the PCH XHCI controller (offset 0x80EC) by default while on the NUC it is a hidden BIOS option that is default disabled. In compliance mode, ports have to be reset before they can be used. This is a potential root cause.

EDIT: Testing the feature on another hackintosh with HOST_CTRL_PORT_LINK_REG.DIS_LINK_CM not set resulted in a working USB sleep/resume. This is not the root cause.

osy commented 5 years ago

After reversing the majority of the sleep-resume UEFI path, it's evident that the PM_Enable bit loss/XHCI controller reset is a hardware issue/design issue and cannot be fixed in BIOS. The best we can do is patch OSX's XHCI driver to be aware of this quirk but this is a extremely difficult task because of how the drivers are designed (children of the device cannot be re-associated).

osy commented 4 years ago

https://github.com/torvalds/linux/commit/b8cb91e058cd0c0f02059c1207293c5b31d350fa https://github.com/torvalds/linux/commit/c3c5819a350952439c3198aa46581f9e4c46557f This might be related or more likely it's another false flag.