jakeday / linux-surface

Linux Kernel for Surface Devices
2.59k stars 241 forks source link

Surface Pro 6: Wireless interface disappears and locks up after suspend #369

Closed DolceTriade closed 5 years ago

DolceTriade commented 5 years ago

I want to start by saying that this is not like other issues that have been reported before where the WiFi is simply not connected after waking up from suspend. In this case, the wireless interface actually disappears from the list of interfaces that Linux shows. Furthermore, attempts to remove the mwifiex_pcie module hang forever. Lastly, I also cannot shutdown the system successfully because it seems there is a stuck kernel thread that doesn't die and therefore prevents the system from shutting down cleanly.

Below are kernel logs prior to suspend and after waking up from suspend:

[ 2498.153746] mwifiex_pcie 0000:01:00.0: PREP_CMD: FW in reset state
[ 2498.153747] mwifiex_pcie 0000:01:00.0: deleting the crypto keys
[ 2498.153748] mwifiex_pcie 0000:01:00.0: PREP_CMD: FW in reset state
[ 2498.153749] mwifiex_pcie 0000:01:00.0: deleting the crypto keys
[ 2498.153750] mwifiex_pcie 0000:01:00.0: PREP_CMD: FW in reset state
[ 2498.153751] mwifiex_pcie 0000:01:00.0: deleting the crypto keys
[ 2498.153752] mwifiex_pcie 0000:01:00.0: PREP_CMD: FW in reset state
[ 2498.153753] mwifiex_pcie 0000:01:00.0: deleting the crypto keys
[ 2498.154059] mwifiex_pcie 0000:01:00.0: PREP_CMD: FW in reset state
[ 2498.162617] mwifiex_pcie 0000:01:00.0: info: shutdown mwifiex...
[ 2498.163604] mwifiex_pcie 0000:01:00.0: PREP_CMD: card is removed
[ 2498.163613] mwifiex_pcie 0000:01:00.0: PREP_CMD: card is removed
[ 2498.217545] PM: suspend entry (s2idle)
[ 2498.217546] PM: Syncing filesystems ... done.
[ 2498.222839] Freezing user space processes ... (elapsed 0.001 seconds) done.
[ 2498.223963] OOM killer disabled.
[ 2498.223963] Freezing remaining freezable tasks ... (elapsed 0.000 seconds) done.
[ 2498.224914] Suspending console(s) (use no_console_suspend to debug)
[63004.440076] [drm] GuC: Loaded firmware i915/kbl_guc_ver9_39.bin (version 9.39)
[63004.440196] i915 0000:00:02.0: GuC firmware version 9.39
[63004.440200] i915 0000:00:02.0: GuC submission enabled
[63004.440203] i915 0000:00:02.0: HuC disabled
[63004.648218] [drm] Reducing the compressed framebuffer size. This may lead to less power savings than a non-reduced-size. Try to increase stolen memory size if available in BIOS.
[63004.670655] Restarting kernel threads ... done.
[63004.671751] PM: PM: Sleeping for 3000 milliseconds.
[63005.477931] surfacegen5_acpi_ssh serial0-0: recv: invalid start of message
[63005.578395] surfacegen5_acpi_ssh serial0-0: recv: invalid start of message
[63005.678512] surfacegen5_acpi_ssh serial0-0: recv: invalid start of message
[63005.778641] surfacegen5_acpi_ssh serial0-0: recv: invalid start of message
[63005.778652] surfacegen5_acpi_ssh serial0-0: recv: invalid start of message
[63006.491519] surfacegen5_acpi_ssh serial0-0: rqst: communication timed out
[63006.491533] surfacegen5_acpi_san MSHW0091:00: san_rqst: IO error occured, trying again
[63006.494975] surfacegen5_acpi_ssh serial0-0: recv: invalid start of message
[63006.595853] surfacegen5_acpi_ssh serial0-0: recv: invalid start of message
[63006.696879] surfacegen5_acpi_ssh serial0-0: recv: invalid start of message
[63006.798128] surfacegen5_acpi_ssh serial0-0: recv: invalid start of message
[63006.798139] surfacegen5_acpi_ssh serial0-0: recv: invalid start of message
[63007.145417] usb 1-7: new full-speed USB device number 5 using xhci_hcd
[63007.272785] usb 1-7: New USB device found, idVendor=045e, idProduct=09c0, bcdDevice= 0.07
[63007.272791] usb 1-7: New USB device strings: Mfr=1, Product=2, SerialNumber=0
[63007.272794] usb 1-7: Product: Surface Type Cover
[63007.272798] usb 1-7: Manufacturer: Microsoft
[63007.279745] input: Microsoft Surface Type Cover Keyboard as /devices/pci0000:00/0000:00:14.0/usb1/1-7/1-7:1.0/0003:045E:09C0.0005/input/input68
[63007.331823] input: Microsoft Surface Type Cover Mouse as /devices/pci0000:00/0000:00:14.0/usb1/1-7/1-7:1.0/0003:045E:09C0.0005/input/input69
[63007.332255] input: Microsoft Surface Type Cover Consumer Control as /devices/pci0000:00/0000:00:14.0/usb1/1-7/1-7:1.0/0003:045E:09C0.0005/input/input70
[63007.332543] input: Microsoft Surface Type Cover Touchpad as /devices/pci0000:00/0000:00:14.0/usb1/1-7/1-7:1.0/0003:045E:09C0.0005/input/input72
[63007.333220] hid-multitouch 0003:045E:09C0.0005: input,hiddev0,hidraw0: USB HID v1.11 Keyboard [Microsoft Surface Type Cover] on usb-0000:00:14.0-7/input0
[63007.515522] surfacegen5_acpi_ssh serial0-0: rqst: communication timed out
[63007.515535] surfacegen5_acpi_san MSHW0091:00: san_rqst: IO error occured, trying again
[63007.518939] surfacegen5_acpi_ssh serial0-0: recv: invalid start of message
[63007.619963] surfacegen5_acpi_ssh serial0-0: recv: invalid start of message
[63007.707518] OOM killer enabled.
[63007.707521] Restarting tasks ... done.
[63007.771402] PM: suspend exit
[63007.789831] IPTS ipts_mei_cl_init() is called
[63007.803488] probing Intel Precise Touch & Stylus
[63007.803490] IPTS using DMA_BIT_MASK(64)
[63007.811582] input: ipts 045E:001F UNKNOWN as /devices/pci0000:00/0000:00:16.4/mei::3e8d0870-271a-4208-8eb5-9acb9402ae04:0F/0044:045E:001F.0006/input/input88
[63007.811724] input: ipts 045E:001F Pen as /devices/pci0000:00/0000:00:16.4/mei::3e8d0870-271a-4208-8eb5-9acb9402ae04:0F/0044:045E:001F.0006/input/input90
[63007.811852] input: ipts 045E:001F Touchscreen as /devices/pci0000:00/0000:00:16.4/mei::3e8d0870-271a-4208-8eb5-9acb9402ae04:0F/0044:045E:001F.0006/input/input91
[63007.811979] input: ipts 045E:001F Mouse as /devices/pci0000:00/0000:00:16.4/mei::3e8d0870-271a-4208-8eb5-9acb9402ae04:0F/0044:045E:001F.0006/input/input92
[63007.813812] hid-multitouch 0044:045E:001F.0006: input,hidraw1: <UNKNOWN> HID v3ad00.00 Mouse [ipts 045E:001F] on heci3
[63007.848025] cfg80211: Loading compiled-in X.509 certificates for regulatory database
[63007.848262] cfg80211: Loaded X.509 cert 'sforshee: 00b28ddf47aef9cea7'
[63007.848286] platform regulatory.0: Direct firmware load for regulatory.db failed with error -2
[63007.848288] cfg80211: failed to load regulatory.db
[63007.871108] mwifiex_pcie: try set_consistent_dma_mask(32)
[63007.871161] mwifiex_pcie: PCI memory map Virt0: 00000000d27501e0 PCI memory map Virt2: 0000000087b685b8
[63007.873055] mwifiex_pcie 0000:01:00.0: WLAN FW already running! Skip FW dnld
[63007.873057] mwifiex_pcie 0000:01:00.0: WLAN FW is active
[63007.928645] mwifiex_pcie 0000:01:00.0: Unknown api_id: 4
[63007.957430] mwifiex_pcie 0000:01:00.0: info: MWIFIEX VERSION: mwifiex 1.0 (15.68.7.p154) 
[63007.957432] mwifiex_pcie 0000:01:00.0: driver_version = mwifiex 1.0 (15.68.7.p154) 
[63007.957994] mwifiex_pcie 0000:01:00.0 wlp1s0: renamed from mlan0
[63007.968724] IPv6: ADDRCONF(NETDEV_UP): wlp1s0: link is not ready
[63008.021949] IPv6: ADDRCONF(NETDEV_UP): wlp1s0: link is not ready
[63008.027233] IPv6: ADDRCONF(NETDEV_UP): wlp1s0: link is not ready
[63008.036875] IPv6: ADDRCONF(NETDEV_UP): wlp1s0: link is not ready
[63008.082765] ipts mei::3e8d0870-271a-4208-8eb5-9acb9402ae04:0F: touch enabled 4
[63012.082099] mwifiex_pcie 0000:01:00.0: scan failed: -16
[63012.123825] IPv6: ADDRCONF(NETDEV_UP): wlp1s0: link is not ready
[63016.225198] mwifiex_pcie 0000:01:00.0: info: trying to associate to 'ATT3T33vks-5G' bssid dc:7f:a4:25:22:7a
[63016.257888] mwifiex_pcie 0000:01:00.0: info: associated to bssid dc:7f:a4:25:22:7a successfully
[63016.351131] IPv6: ADDRCONF(NETDEV_CHANGE): wlp1s0: link becomes ready
[63030.491107] mwifiex_pcie 0000:01:00.0: mwifiex_cmd_timeout_func: Timeout cmd id = 0xce, act = 0xdc00
[63030.491113] mwifiex_pcie 0000:01:00.0: num_data_h2c_failure = 0
[63030.491117] mwifiex_pcie 0000:01:00.0: num_cmd_h2c_failure = 0
[63030.491120] mwifiex_pcie 0000:01:00.0: is_cmd_timedout = 1
[63030.491124] mwifiex_pcie 0000:01:00.0: num_tx_timeout = 0
[63030.491127] mwifiex_pcie 0000:01:00.0: last_cmd_index = 0
[63030.491131] mwifiex_pcie 0000:01:00.0: last_cmd_id: ce 00 a4 00 7f 00 16 00 cf 00
[63030.491135] mwifiex_pcie 0000:01:00.0: last_cmd_act: 00 dc 00 00 00 00 00 00 00 dc
[63030.491138] mwifiex_pcie 0000:01:00.0: last_cmd_resp_index = 4
[63030.491142] mwifiex_pcie 0000:01:00.0: last_cmd_resp_id: 16 80 a4 80 7f 80 16 80 cf 80
[63030.491145] mwifiex_pcie 0000:01:00.0: last_event_index = 4
[63030.491149] mwifiex_pcie 0000:01:00.0: last_event: 17 00 2b 00 33 00 1c 00 33 00
[63030.491153] mwifiex_pcie 0000:01:00.0: data_sent=1 cmd_sent=1
[63030.491156] mwifiex_pcie 0000:01:00.0: ps_mode=0 ps_state=0
[63030.492074] mwifiex_pcie 0000:01:00.0: ===mwifiex driverinfo dump start===
[63030.492082] mwifiex_pcie 0000:01:00.0: info: MWIFIEX VERSION: mwifiex 1.0 (15.68.7.p154) 
[63030.492098] mwifiex_pcie 0000:01:00.0: PCIE register dump start
[63030.492187] mwifiex_pcie 0000:01:00.0: pcie scratch register:
[63030.492202] mwifiex_pcie 0000:01:00.0: reg:0xcf0, value=0xfedcba00
               reg:0xcf8, value=0x2634eaa
               reg:0xcfc, value=0x2909107

[63030.492205] mwifiex_pcie 0000:01:00.0: PCIE register dump end
[63030.492310] mwifiex_pcie 0000:01:00.0: ===mwifiex driverinfo dump end===
[63030.492315] mwifiex_pcie 0000:01:00.0: == mwifiex firmware dump start ==
[63037.147166] mwifiex_pcie 0000:01:00.0: cmd_wait_q terminated: -110
[63037.147172] mwifiex_pcie 0000:01:00.0: failed to get signal information
[63037.147392] mwifiex_pcie 0000:01:00.0: PREP_CMD: FW is in bad state
[63037.147394] mwifiex_pcie 0000:01:00.0: failed to get signal information
[63037.147579] mwifiex_pcie 0000:01:00.0: PREP_CMD: FW is in bad state
[63037.147581] mwifiex_pcie 0000:01:00.0: failed to get signal information
[63037.147626] mwifiex_pcie 0000:01:00.0: PREP_CMD: FW is in bad state
[63037.147628] mwifiex_pcie 0000:01:00.0: failed to get signal information
[63037.147670] mwifiex_pcie 0000:01:00.0: PREP_CMD: FW is in bad state
[63037.147672] mwifiex_pcie 0000:01:00.0: PREP_CMD: FW is in bad state
[63037.147692] mwifiex_pcie 0000:01:00.0: PREP_CMD: FW is in bad state
[63037.147694] mwifiex_pcie 0000:01:00.0: PREP_CMD: FW is in bad state
[63043.832132] mwifiex_pcie 0000:01:00.0: PREP_CMD: FW is in bad state
[63043.832137] mwifiex_pcie 0000:01:00.0: failed to get signal information
[63043.832263] mwifiex_pcie 0000:01:00.0: PREP_CMD: FW is in bad state
[63043.832265] mwifiex_pcie 0000:01:00.0: failed to get signal information
[63049.833173] mwifiex_pcie 0000:01:00.0: PREP_CMD: FW is in bad state
[63049.833177] mwifiex_pcie 0000:01:00.0: failed to get signal information
[63049.833203] mwifiex_pcie 0000:01:00.0: PREP_CMD: FW is in bad state
[63049.833205] mwifiex_pcie 0000:01:00.0: failed to get signal information
[63055.833436] mwifiex_pcie 0000:01:00.0: PREP_CMD: FW is in bad state
[63055.833440] mwifiex_pcie 0000:01:00.0: failed to get signal information
[63055.833467] mwifiex_pcie 0000:01:00.0: PREP_CMD: FW is in bad state
[63055.833469] mwifiex_pcie 0000:01:00.0: failed to get signal information
[63061.833277] mwifiex_pcie 0000:01:00.0: PREP_CMD: FW is in bad state
[63061.833281] mwifiex_pcie 0000:01:00.0: failed to get signal information
[63061.833314] mwifiex_pcie 0000:01:00.0: PREP_CMD: FW is in bad state
[63061.833316] mwifiex_pcie 0000:01:00.0: failed to get signal information
[63067.833462] mwifiex_pcie 0000:01:00.0: PREP_CMD: FW is in bad state
[63067.833465] mwifiex_pcie 0000:01:00.0: failed to get signal information
[63067.833497] mwifiex_pcie 0000:01:00.0: PREP_CMD: FW is in bad state
[63067.833499] mwifiex_pcie 0000:01:00.0: failed to get signal information
[63071.963105] ------------[ cut here ]------------
[63071.963109] NETDEV WATCHDOG: wlp1s0 (mwifiex_pcie): transmit queue 1 timed out
[63071.963126] WARNING: CPU: 0 PID: 0 at net/sched/sch_generic.c:461 dev_watchdog+0x21a/0x220
[63071.963128] Modules linked in: mwifiex_pcie mwifiex cfg80211 mei_me intel_ipts mei 8021q mrp btusb btrtl btbcm btintel bluetooth ecdh_generic snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic hid_sensor_als hid_sensor_gyro_3d hid_sensor_rotation hid_sensor_accel_3d hid_sensor_trigger industrialio_triggered_buffer kfifo_buf hid_sensor_iio_common industrialio hid_sensor_hub intel_ishtp_hid battery input_leds led_class mousedev joydev nls_iso8859_1 nls_cp437 vfat fat intel_rapl snd_soc_skl x86_pkg_temp_thermal i915 intel_powerclamp coretemp snd_soc_skl_ipc kvm_intel snd_soc_sst_ipc hid_multitouch snd_soc_sst_dsp snd_hda_ext_core snd_soc_acpi snd_soc_core kvm hid_generic irqbypass crct10dif_pclmul crc32_pclmul mac_hid ghash_clmulni_intel usbhid 8250_dw snd_compress i2c_algo_bit pcbc drm_kms_helper
[63071.963187]  ac97_bus snd_pcm_dmaengine snd_hda_intel gpio_keys ipu3_cio2 snd_hda_codec v4l2_fwnode videobuf2_dma_sg drm videobuf2_memops snd_hda_core videobuf2_v4l2 videobuf2_common snd_hwdep rfkill snd_pcm videodev aesni_intel tpm_crb snd_timer idma64 aes_x86_64 crypto_simd intel_gtt cryptd snd glue_helper agpgart intel_cstate intel_uncore syscopyarea sysfillrect intel_rapl_perf sysimgblt intel_ish_ipc intel_lpss_pci pcspkr media soundcore fb_sys_fops processor_thermal_device intel_lpss intel_ishtp intel_soc_dts_iosf intel_pch_thermal evdev wmi rtc_cmos tpm_tis tpm_tis_core surface_acpi pinctrl_sunrisepoint tpm pinctrl_intel hid rng_core soc_button_array int3400_thermal int3403_thermal acpi_thermal_rel int340x_thermal_zone dptf_power pcc_cpufreq ac ip_tables x_tables ext4 crc32c_generic crc16 mbcache
[63071.963248]  jbd2 fscrypto xhci_pci xhci_hcd crc32c_intel usbcore usb_common [last unloaded: cfg80211]
[63071.963259] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G        W         4.18.20-1-surface #1
[63071.963261] Hardware name: Microsoft Corporation Surface Pro 6/Surface Pro 6, BIOS 234.2344.769 09/08/2018
[63071.963264] RIP: 0010:dev_watchdog+0x21a/0x220
[63071.963265] Code: 49 63 4c 24 e8 eb 8c 4c 89 ef c6 05 5c 01 ad 00 01 e8 6a eb fc ff 89 d9 4c 89 ee 48 c7 c7 98 8d ed 91 48 89 c2 e8 40 da a4 ff <0f> 0b eb be 66 90 0f 1f 44 00 00 48 c7 47 08 00 00 00 00 48 c7 07 
[63071.963313] RSP: 0018:ffff967daf403e78 EFLAGS: 00010286
[63071.963316] RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000000
[63071.963317] RDX: 0000000000000007 RSI: ffffffff91e69d56 RDI: 00000000ffffffff
[63071.963319] RBP: ffff967cf9d1d45c R08: 0000000000000001 R09: 00000000000003d2
[63071.963320] R10: 0000000000000004 R11: 0000000000000000 R12: ffff967cf9d1d478
[63071.963322] R13: ffff967cf9d1d000 R14: 0000000000000004 R15: ffff967d932f0c40
[63071.963324] FS:  0000000000000000(0000) GS:ffff967daf400000(0000) knlGS:0000000000000000
[63071.963326] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[63071.963328] CR2: 00007fe35bb596e0 CR3: 000000021d00a002 CR4: 00000000003606f0
[63071.963329] Call Trace:
[63071.963332]  <IRQ>
[63071.963337]  ? qdisc_reset+0xe0/0xe0
[63071.963340]  ? qdisc_reset+0xe0/0xe0
[63071.963344]  call_timer_fn+0x2b/0x160
[63071.963347]  ? qdisc_reset+0xe0/0xe0
[63071.963350]  expire_timers+0x99/0x110
[63071.963353]  run_timer_softirq+0x8a/0x160
[63071.963357]  ? sched_clock+0x5/0x10
[63071.963360]  ? sched_clock_cpu+0xe/0xd0
[63071.963364]  __do_softirq+0x112/0x32b
[63071.963369]  irq_exit+0xd9/0xf0
[63071.963374]  smp_apic_timer_interrupt+0x87/0x180
[63071.963378]  apic_timer_interrupt+0xf/0x20
[63071.963380]  </IRQ>
[63071.963385] RIP: 0010:cpuidle_enter_state+0xb7/0x2e0
[63071.963385] Code: e8 4e 2f b0 ff 80 7c 24 03 00 74 17 9c 58 0f 1f 44 00 00 f6 c4 02 0f 85 fb 01 00 00 31 ff e8 c0 23 b6 ff fb 66 0f 1f 44 00 00 <48> b8 ff ff ff ff f3 01 00 00 4c 29 f3 ba ff ff ff 7f 48 39 c3 7f 
[63071.963433] RSP: 0018:ffffffff92003e78 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
[63071.963436] RAX: ffff967daf400000 RBX: 0000395d15f49230 RCX: 000000000000001f
[63071.963437] RDX: 0000395d15f49230 RSI: ffffffff91e69d56 RDI: ffffffff91e6a01d
[63071.963439] RBP: 0000000000000004 R08: 0000000000000002 R09: 0000000000021340
[63071.963440] R10: 000067456800da52 R11: ffff967daf420a68 R12: ffff967daf42bc00
[63071.963442] R13: ffffffff920ae738 R14: 0000395d15e413b1 R15: 00000000000001d1
[63071.963448]  ? cpuidle_enter_state+0x92/0x2e0
[63071.963451]  do_idle+0x217/0x250
[63071.963455]  cpu_startup_entry+0x6f/0x80
[63071.963460]  start_kernel+0x515/0x535
[63071.963466]  secondary_startup_64+0xa5/0xb0
[63071.963470] ---[ end trace ae746f8d59ffabfb ]---
[63071.963477] mwifiex_pcie 0000:01:00.0: 4297234432 : Tx timeout(#1), bss_type-num = 0-0
[63073.833279] mwifiex_pcie 0000:01:00.0: PREP_CMD: FW is in bad state
[63073.833284] mwifiex_pcie 0000:01:00.0: failed to get signal information
[63073.833317] mwifiex_pcie 0000:01:00.0: PREP_CMD: FW is in bad state
[63073.833319] mwifiex_pcie 0000:01:00.0: failed to get signal information
[63077.595129] mwifiex_pcie 0000:01:00.0: 4297240064 : Tx timeout(#2), bss_type-num = 0-0
[63079.833451] mwifiex_pcie 0000:01:00.0: PREP_CMD: FW is in bad state
[63079.833455] mwifiex_pcie 0000:01:00.0: failed to get signal information
[63079.833486] mwifiex_pcie 0000:01:00.0: PREP_CMD: FW is in bad state
[63079.833488] mwifiex_pcie 0000:01:00.0: failed to get signal information
[63081.034435] mwifiex_pcie 0000:01:00.0: Firmware dump Finished!
[63081.034439] mwifiex_pcie 0000:01:00.0: == mwifiex firmware dump end ==
[63081.034752] mwifiex_pcie 0000:01:00.0: == mwifiex dump information to /sys/class/devcoredump start
[63081.034813] mwifiex_pcie 0000:01:00.0: == mwifiex dump information to /sys/class/devcoredump end
[63081.034819] mwifiex_pcie 0000:01:00.0: PREP_CMD: FW is in bad state
[63081.034858] mwifiex_pcie 0000:01:00.0: PREP_CMD: card is removed
[63081.034862] mwifiex_pcie 0000:01:00.0: deleting the crypto keys
[63081.034864] mwifiex_pcie 0000:01:00.0: PREP_CMD: card is removed
[63081.034866] mwifiex_pcie 0000:01:00.0: deleting the crypto keys
[63081.034868] mwifiex_pcie 0000:01:00.0: PREP_CMD: card is removed
[63081.034870] mwifiex_pcie 0000:01:00.0: deleting the crypto keys
[63081.034872] mwifiex_pcie 0000:01:00.0: PREP_CMD: card is removed
[63081.034873] mwifiex_pcie 0000:01:00.0: deleting the crypto keys
[63081.034875] mwifiex_pcie 0000:01:00.0: PREP_CMD: card is removed
[63081.034877] mwifiex_pcie 0000:01:00.0: deleting the crypto keys
[63081.034879] mwifiex_pcie 0000:01:00.0: PREP_CMD: card is removed
[63081.034881] mwifiex_pcie 0000:01:00.0: deleting the crypto keys
[63081.035107] mwifiex_pcie 0000:01:00.0: info: shutdown mwifiex...
[63081.035319] mwifiex_pcie 0000:01:00.0: PREP_CMD: card is removed
[63081.035497] mwifiex_pcie 0000:01:00.0: rx_pending=0, tx_pending=1,   cmd_pending=0
[63081.039075] mwifiex_pcie 0000:01:00.0: PREP_CMD: card is removed
[63081.039080] mwifiex_pcie 0000:01:00.0: PREP_CMD: card is removed
[63081.039092] mwifiex_pcie 0000:01:00.0: PREP_CMD: card is removed
[63082.076906] mwifiex_pcie 0000:01:00.0: WLAN FW already running! Skip FW dnld
[63082.076912] mwifiex_pcie 0000:01:00.0: WLAN FW is active
[63092.443086] mwifiex_pcie 0000:01:00.0: mwifiex_cmd_timeout_func: Timeout cmd id = 0xfa, act = 0x4000
[63092.443093] mwifiex_pcie 0000:01:00.0: num_data_h2c_failure = 0
[63092.443097] mwifiex_pcie 0000:01:00.0: num_cmd_h2c_failure = 0
[63092.443100] mwifiex_pcie 0000:01:00.0: is_cmd_timedout = 1
[63092.443103] mwifiex_pcie 0000:01:00.0: num_tx_timeout = 0
[63092.443107] mwifiex_pcie 0000:01:00.0: last_cmd_index = 1
[63092.443111] mwifiex_pcie 0000:01:00.0: last_cmd_id: ce 00 fa 00 7f 00 16 00 cf 00
[63092.443115] mwifiex_pcie 0000:01:00.0: last_cmd_act: 00 dc 00 40 00 00 00 00 00 dc
[63092.443118] mwifiex_pcie 0000:01:00.0: last_cmd_resp_index = 4
[63092.443122] mwifiex_pcie 0000:01:00.0: last_cmd_resp_id: 16 80 a4 80 7f 80 16 80 cf 80
[63092.443125] mwifiex_pcie 0000:01:00.0: last_event_index = 4
[63092.443128] mwifiex_pcie 0000:01:00.0: last_event: 17 00 2b 00 33 00 1c 00 33 00
[63092.443132] mwifiex_pcie 0000:01:00.0: data_sent=0 cmd_sent=1
[63092.443136] mwifiex_pcie 0000:01:00.0: ps_mode=0 ps_state=0
[63092.443171] mwifiex_pcie 0000:01:00.0: info: _mwifiex_fw_dpc: unregister device

Then, here are logs are running "sudo modprobe -r mwifiex_pcie":


[63332.059168] INFO: task kworker/2:1:4100 blocked for more than 120 seconds.
[63332.059179]       Tainted: G        W         4.18.20-1-surface #1
[63332.059183] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[63332.059187] kworker/2:1     D    0  4100      2 0x80000000
[63332.059207] Workqueue: events mwifiex_pcie_work [mwifiex_pcie]
[63332.059211] Call Trace:
[63332.059223]  ? __schedule+0x29b/0x8b0
[63332.059229]  ? __switch_to_asm+0x40/0x70
[63332.059234]  ? __switch_to_asm+0x34/0x70
[63332.059238]  schedule+0x32/0x90
[63332.059243]  schedule_timeout+0x311/0x4a0
[63332.059248]  ? _raw_spin_unlock_irq+0x1d/0x30
[63332.059255]  ? finish_task_switch+0x82/0x2d0
[63332.059259]  ? __switch_to_asm+0x40/0x70
[63332.059264]  wait_for_common+0x15f/0x190
[63332.059270]  ? wake_up_q+0x70/0x70
[63332.059276]  ? drain_workqueue+0x120/0x120
[63332.059280]  __flush_work+0x10e/0x1c0
[63332.059285]  ? flush_workqueue_prep_pwqs+0x130/0x130
[63332.059291]  __cancel_work_timer+0x10a/0x190
[63332.059299]  ? free_unref_page_commit+0x70/0xf0
[63332.059307]  mwifiex_cleanup_pcie+0x29/0xd0 [mwifiex_pcie]
[63332.059322]  mwifiex_free_adapter+0x24/0xe0 [mwifiex]
[63332.059333]  _mwifiex_fw_dpc+0x245/0x4c0 [mwifiex]
[63332.059342]  ? request_firmware+0x3b/0x50
[63332.059352]  mwifiex_reinit_sw+0x164/0x2c0 [mwifiex]
[63332.059359]  mwifiex_pcie_reset_done+0x4c/0xb0 [mwifiex_pcie]
[63332.059367]  pci_try_reset_function+0x36/0x60
[63332.059373]  process_one_work+0x1eb/0x410
[63332.059378]  worker_thread+0x2d/0x3d0
[63332.059384]  ? process_one_work+0x410/0x410
[63332.059390]  kthread+0x112/0x130
[63332.059396]  ? kthread_flush_work_fn+0x10/0x10
[63332.059401]  ret_from_fork+0x35/0x40
ZZZXXX0110 commented 5 years ago

Probably connected to https://github.com/jakeday/linux-surface/issues/348, also if you're using KDE and latte-dock, you might need to restart latte-dock with "latte-dock -r" in order for the wifi icon to show up, after the network manager get's started by the sleep script.

DolceTriade commented 5 years ago

@ZZZXXX0110 Hm, I did look at the issue before posting this one, but it seems different. In this one, the wifi interface literally disappears from Linux and it looks like its because a kernel thread locks up somehow. I'll try to do other things next time I reproduce it. I'll add that it doesn't happen 100% of the time, but maybe 10% of the time.

jakeday commented 5 years ago

@DolceTriade Can you try this again with the latest kernel and setup.sh script?

DolceTriade commented 5 years ago

Yes, I haven't tried this recently, but I've had a 100% repro rate when I hibernate to disk.

kitakar5525 commented 5 years ago

From the comment on Ubuntu launchpad

We can avoid this type of lockup if we disable dump and reset:

Disable mwifiex dump and reset to prevent mwifiex_pcie module freezing
which results in preventing the system from shutting down cleanly.

diff --git a/drivers/net/wireless/marvell/mwifiex/cmdevt.c b/drivers/net/wireless/marvell/mwifiex/cmdevt.c
index 9cfcdf6..3ca9854 100644
--- a/drivers/net/wireless/marvell/mwifiex/cmdevt.c
+++ b/drivers/net/wireless/marvell/mwifiex/cmdevt.c
@@ -1004,11 +1004,11 @@ mwifiex_cmd_timeout_func(struct timer_list *t)
        return;
    }

-   if (adapter->if_ops.device_dump)
-       adapter->if_ops.device_dump(adapter);
+   //if (adapter->if_ops.device_dump)
+       //adapter->if_ops.device_dump(adapter);

-   if (adapter->if_ops.card_reset)
-       adapter->if_ops.card_reset(adapter);
+   //if (adapter->if_ops.card_reset)
+       //adapter->if_ops.card_reset(adapter);
 }

 void

EDIT I can confirm this patch works most of the time against prevents the system from shutting down cleanly when mwifiex_pcie crashed.

kitakar5525 commented 5 years ago

And about crashing after suspend issue, I found a way to reset wifi card using acpi_call.

RP12 and 00:1d.3 can be different from device to device. EDIT: I think the equivalent value to SP6 is RP04. For the other devices, search your DSDT for _RST

# Reset Wi-Fi for SB1
echo '\_SB.PCI0.RP12.PXSX.PRWF._RST' > /proc/acpi/call
echo 1 > "/sys/bus/pci/devices/0000:00:1d.3/remove"
sleep 1
echo 1 > /sys/bus/pci/rescan
sleep 1

00:1d.3 is PCI bridge connected by Wi-Fi card.

$ lspci -nnvt
-[0000:00]-
                   [...]
                    +-1d.3-[03]----00.0  Marvell Technology Group Ltd. 88W8897 [AVASTAR] 802.11ac Wireless [11ab:2b38]

$ lspci -nn
00:1d.3 PCI bridge [0604]: Intel Corporation Device [8086:9d1b] (rev f1)

DSDT of SB1 about RP12 is:

Scope (_SB.PCI0.RP12.PXSX)
    {
        PowerResource (PRWF, 0x05, 0x0000)
        {
            Name (_STA, One)  // _STA: Status
            Method (_ON, 0, NotSerialized)  // _ON_: Power On
            {
                If ((_STA == Zero))
                {
                    SGOV (0x02000012, One)
                    Sleep (0x96)
                    SGOV (0x02000015, One)
                    Sleep (0x64)
                    _STA = One
                }
            }

            Method (_OFF, 0, NotSerialized)  // _OFF: Power Off
            {
                If ((_STA == One))
                {
                    SGOV (0x02000012, Zero)
                    Sleep (0x96)
                    SGOV (0x02000015, Zero)
                    Sleep (0x64)
                    _STA = Zero
                }
            }

            Method (_RST, 0, NotSerialized)  // _RST: Device Reset
            {
                SGOV (0x02000012, Zero)
                Sleep (0xFA)
                SGOV (0x02000012, One)
                Sleep (0xFA)
                _STA = One
            }
        }
kitakar5525 commented 5 years ago

Or simply disable d3cold for Wi-Fi and remove/reload mwifiex_pcie module after suspend works for SB1. However, I think it is not power-efficient to disable D3 state.

$ lspci -nn | grep -i marvell
03:00.0 Ethernet controller [0200]: Marvell Technology Group Ltd. 88W8897 [AVASTAR] 802.11ac Wireless [11ab:2b38]
sudo su -c "echo 0 > '/sys/bus/pci/devices/0000:03:00.0/d3cold_allowed'" # for SB1
sudo su -c "echo 0 > '/sys/bus/pci/devices/0000:01:00.0/d3cold_allowed'" # for SP6

sudo su -c "echo 0 > /sys/bus/pci/devices/0000:$(lspci | grep -i Marvell | cut -d " " -f 1)/d3cold_allowed" # For the other devices
# /usr/lib/systemd/system-sleep/sleep
case $1/$2 in
    post/*)
        modprobe -r mwifiex_pcie
        modprobe -i mwifiex_pcie
kitakar5525 commented 5 years ago

Of course, we can disable D3 state of Wi-Fi card by kernel side, but I don't like to do so because I observed higher power consumption during suspend. I prefer to reset the Wi-Fi card by acpi_call for now.

diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index 6bc27b7f..bffb78ad 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -1332,6 +1332,9 @@ DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_AL, PCI_ANY_ID,
    occur when mode detecting */
 DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_VIA, PCI_ANY_ID,
                PCI_CLASS_STORAGE_IDE, 8, quirk_no_ata_d3);
+/* Marvell Wi-Fi 88W8897 cannot wakeup from D3 state after s2idle
+   on Surface devices */
+DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_MARVELL, 0x2b38, quirk_no_ata_d3);

 /*
  * This was originally an Alpha-specific thing, but it really fits here.

(This example has another problem; applies quirk to all Marvell Wi-Fi 88W8897 card, even to devices which are not affected by this issue like Surface 3.)

DolceTriade commented 5 years ago

Excellent comments! This doesn't happen very often with me for the new kernel when I close the lid (this used to happen like 25% of the time on the old build) but suspend to ram and suspend to disk appear broken for me on the new kernel. Since I don't see any new regressions related to this, I assumed this was a "me" problem. I've been trying to fix it whenever I get a chance but haven't had much luck. Also, suspend isn't that critical of a feature for me because I don't mind hooking it up to power when I'm not using it.

DolceTriade commented 5 years ago

Ok, I upgraded again to 4.19.23 and that appeared to fix suspend and hibernate and I haven't been able to reproduce the wifi locking up. I'll close this now. Thanks!

kitakar5525 commented 5 years ago

Sorry, maybe I said the wrong thing. I can reset Wi-Fi card just removing the PCI bridge which is connected by Wi-Fi card and then, rescan the PCI devices. No need to use acpi_call.

# reset wifi for SB1
echo 1 > "/sys/bus/pci/devices/0000:00:1d.3/remove"
sleep 1
echo 1 > /sys/bus/pci/rescan
sleep 1

(Current /lib/systemd/system-sleep/sleep in this repo does not work for my Surface Book with Performance Base, too.)

EDIT Sorry again. Without issuingecho '\_SB.PCI0.RP12.PXSX.PRWF._RST' > /proc/acpi/call, I get the message Authentication required by wireless Network from NetworkManager. Wi-Fi is not usable then.

dmesg says:

kern  :info  : [Tue Feb 19 21:45:46 2019] mwifiex_pcie 0000:03:00.0: cmd_wait_q terminated: -110

Then, I issued the set of commands I mentioned in https://github.com/jakeday/linux-surface/issues/369#issuecomment-464722366 and Wi-Fi is working again.

minrui-hust commented 5 years ago

I have the same problem on SP6 with the latest release,also, kworker is blocked:

[13293.335402] INFO: task kworker/3:1:5018 blocked for more than 120 seconds. [13293.335405] Tainted: G W 4.19.27-surface-linux-surface #2 [13293.335405] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.

It seems the wifi driver has some bug, which cause the kworker to stuck。

minikN commented 5 years ago

Alright @qzed I have already closed my other issue because the error I was facing was not Void linux specific. Don't know if you have read up on it, but if not, I was having troubles getting the device to wake up from suspend:

Update regarding the suspend/resume issue.

I took this script and copied the pre part to /etc/zzz.d/suspend/sleep and the post part to /etc/zzz.d/resume/resume.

Going into suspend is working properly now, also waking up is working, however, after it woke up, it suspends immediately afterwards.

Seems like there is one piece missing.

I remapped the power button in the acpi handler.sh to send the device to sleep. In other words, the power button was the only button that did anything to the surface.

A couple of days ago I did some more research. I added

echo enabled > /sys/bus/usb/devices/1-7/power/wakeup
echo enabled > /sys/bus/usb/devices/usb1/power/wakeup
echo enabled > /sys/bus/usb/devices/usb2/power/wakeup

to my pre suspend script and what do you know, now it will wake up again just fine. However, I'm facing the exact same problem described in this thread now. Wifi locks up completely and needs a reboot to work again.