Open tmarkov opened 5 years ago
I have a fairly stock system on Surface Book 2 and this appears to affect me too.
In my case, I was able to restore wifi by manually reloading modprobe mwifiex_pcie
I have a script that I run whenever that happens.
Its contents are:
sudo modprobe -r mwifiex_pcie
sudo modprobe mwifiex_pcie
sudo service network-manager stop
sudo service network-manager start
For me, that script work to reload the W-Fi when it was already working.
In the broken state, however, it has no effect.
you can enable debug output for wifi:
sudo su -c 'echo "file drivers/net/wireless/marvell/mwifiex/* +p" > /sys/kernel/debug/dynamic_debug/control'
sudo su -c "echo 0xffffffff > /sys/kernel/debug/mwifiex/mlan0/debug_mask"
And/or change MWIFIEX_DEFAULT_DEBUG_MASK
(to see early events after reloading modules)
```diff From 5f18d4edb8ad664900a1ef100e673e1d4e6138ac Mon Sep 17 00:00:00 2001 From: kitakar5525 <34676735+kitakar5525@users.noreply.github.com> Date: Thu, 8 Aug 2019 06:36:03 +0900 Subject: [PATCH] mwifiex: change default debug level to any --- drivers/net/wireless/marvell/mwifiex/main.h | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/drivers/net/wireless/marvell/mwifiex/main.h b/drivers/net/wireless/marvell/mwifiex/main.h index d4027a803..350e8ebdc 100644 --- a/drivers/net/wireless/marvell/mwifiex/main.h +++ b/drivers/net/wireless/marvell/mwifiex/main.h @@ -197,9 +197,7 @@ enum MWIFIEX_DEBUG_LEVEL { MWIFIEX_DBG_ANY = 0xffffffff }; -#define MWIFIEX_DEFAULT_DEBUG_MASK (MWIFIEX_DBG_MSG | \ - MWIFIEX_DBG_FATAL | \ - MWIFIEX_DBG_ERROR) +#define MWIFIEX_DEFAULT_DEBUG_MASK MWIFIEX_DBG_ANY __printf(3, 4) void _mwifiex_dbg(const struct mwifiex_adapter *adapter, int mask, -- 2.22.0 ```
I also added debug print:
```diff From db47ac27b5c00c89234bfb9ab94ecea91d1db386 Mon Sep 17 00:00:00 2001 From: kitakar5525 <34676735+kitakar5525@users.noreply.github.com> Date: Thu, 8 Aug 2019 07:09:39 +0900 Subject: [PATCH] mwifiex: print debug --- .../net/wireless/marvell/mwifiex/cfg80211.c | 16 +++++++++++ drivers/net/wireless/marvell/mwifiex/main.c | 10 +++++++ drivers/net/wireless/marvell/mwifiex/pcie.c | 23 ++++++++++++++++ drivers/net/wireless/marvell/mwifiex/scan.c | 27 ++++++++++++++++++- 4 files changed, 75 insertions(+), 1 deletion(-) diff --git a/drivers/net/wireless/marvell/mwifiex/cfg80211.c b/drivers/net/wireless/marvell/mwifiex/cfg80211.c index 6c15e8019..a38881a09 100644 --- a/drivers/net/wireless/marvell/mwifiex/cfg80211.c +++ b/drivers/net/wireless/marvell/mwifiex/cfg80211.c @@ -2577,6 +2577,10 @@ mwifiex_cfg80211_scan(struct wiphy *wiphy, * is disabled is in process */ if (priv->scan_request || priv->scan_aborting) { + if (priv->scan_request) + pr_alert("DEBUG: priv->scan_request is true!\n"); + if (priv->scan_aborting) + pr_alert("DEBUG: priv->scan_aborting is true!\n"); mwifiex_dbg(priv->adapter, WARN, "cmd: Scan already in process..\n"); return -EBUSY; @@ -2592,6 +2596,7 @@ mwifiex_cfg80211_scan(struct wiphy *wiphy, if (!user_scan_cfg) return -ENOMEM; + pr_alert("DEBUG: mwifiex_cfg80211_scan: priv->scan_request = request\n"); priv->scan_request = request; if (request->flags & NL80211_SCAN_FLAG_RANDOM_ADDR) { @@ -2645,7 +2650,9 @@ mwifiex_cfg80211_scan(struct wiphy *wiphy, if (ret) { mwifiex_dbg(priv->adapter, ERROR, "scan failed: %d\n", ret); + pr_alert("DEBUG: mwifiex_cfg80211_scan: setting priv->scan_aborting = false\n"); priv->scan_aborting = false; + pr_alert("DEBUG: mwifiex_cfg80211_scan: setting priv->scan_request = NULL\n"); priv->scan_request = NULL; return ret; } @@ -3442,6 +3449,9 @@ static int mwifiex_cfg80211_suspend(struct wiphy *wiphy, struct mwifiex_private *sta_priv = mwifiex_get_priv(adapter, MWIFIEX_BSS_ROLE_STA); + pr_alert("DEBUG: mwifiex_cfg80211_suspend starting...\n"); + + pr_alert("DEBUG: mwifiex_cfg80211_suspend: setting priv->scan_aborting = true\n"); sta_priv->scan_aborting = true; for (i = 0; i < adapter->priv_num; i++) { priv = adapter->priv[i]; @@ -3509,7 +3519,9 @@ static int mwifiex_cfg80211_suspend(struct wiphy *wiphy, mwifiex_dbg(adapter, ERROR, "Failed to set HS params\n"); done: + pr_alert("DEBUG: mwifiex_cfg80211_suspend: setting priv->scan_aborting = false\n"); sta_priv->scan_aborting = false; + pr_alert("DEBUG: mwifiex_cfg80211_suspend done\n"); return ret; } @@ -3522,6 +3534,8 @@ static int mwifiex_cfg80211_resume(struct wiphy *wiphy) int i; bool report_wakeup_reason = true; + pr_alert("DEBUG: mwifiex_cfg80211_resume starting...\n"); + for (i = 0; i < adapter->priv_num; i++) { priv = adapter->priv[i]; if (priv && priv->netdev) @@ -3584,6 +3598,8 @@ static int mwifiex_cfg80211_resume(struct wiphy *wiphy) adapter->nd_info = NULL; } + pr_alert("DEBUG: mwifiex_cfg80211_resume done\n"); + return 0; } diff --git a/drivers/net/wireless/marvell/mwifiex/main.c b/drivers/net/wireless/marvell/mwifiex/main.c index 0f29d1317..1d6557ec2 100644 --- a/drivers/net/wireless/marvell/mwifiex/main.c +++ b/drivers/net/wireless/marvell/mwifiex/main.c @@ -130,6 +130,8 @@ static int mwifiex_unregister(struct mwifiex_adapter *adapter) { s32 i; + pr_alert("DEBUG: mwifiex_free_adapter starting...\n"); + if (adapter->if_ops.cleanup_if) adapter->if_ops.cleanup_if(adapter); @@ -492,6 +494,7 @@ EXPORT_SYMBOL_GPL(mwifiex_main_process); */ static void mwifiex_free_adapter(struct mwifiex_adapter *adapter) { + pr_alert("DEBUG: mwifiex_free_adapter starting...\n"); if (!adapter) { pr_err("%s: adapter is NULL\n", __func__); return; @@ -739,6 +742,8 @@ static int mwifiex_init_hw_fw(struct mwifiex_adapter *adapter, static int mwifiex_open(struct net_device *dev) { + pr_alert("DEBUG: mwifiex_open\n"); + netif_carrier_off(dev); return 0; @@ -752,6 +757,8 @@ mwifiex_close(struct net_device *dev) { struct mwifiex_private *priv = mwifiex_netdev_get_priv(dev); + pr_alert("DEBUG: mwifiex_close\n"); + if (priv->scan_request) { struct cfg80211_scan_info info = { .aborted = true, @@ -760,7 +767,9 @@ mwifiex_close(struct net_device *dev) mwifiex_dbg(priv->adapter, INFO, "aborting scan on ndo_stop\n"); cfg80211_scan_done(priv->scan_request, &info); + pr_alert("DEBUG: mwifiex_close: setting priv->scan_request = NULL\n"); priv->scan_request = NULL; + pr_alert("DEBUG: mwifiex_close: setting priv->scan_aborting = true\n"); priv->scan_aborting = true; } @@ -1741,6 +1750,7 @@ EXPORT_SYMBOL_GPL(mwifiex_add_card); */ int mwifiex_remove_card(struct mwifiex_adapter *adapter) { + pr_alert("DEBUG: mwifiex_remove_card starting...\n"); if (!adapter) return 0; diff --git a/drivers/net/wireless/marvell/mwifiex/pcie.c b/drivers/net/wireless/marvell/mwifiex/pcie.c index 6e734a83e..6a28d89ec 100644 --- a/drivers/net/wireless/marvell/mwifiex/pcie.c +++ b/drivers/net/wireless/marvell/mwifiex/pcie.c @@ -198,6 +198,8 @@ static int mwifiex_pcie_resume(struct device *dev) struct pcie_service_card *card; struct pci_dev *pdev = to_pci_dev(dev); + pr_alert("DEBUG: mwifiex_pcie_resume\n"); + card = pci_get_drvdata(pdev); if (!card->adapter) { @@ -286,6 +288,8 @@ static void mwifiex_pcie_remove(struct pci_dev *pdev) u32 fw_status; int ret; + pr_alert("DEBUG: mwifiex_pcie_remove starting...\n"); + card = pci_get_drvdata(pdev); wait_for_completion(&card->fw_done); @@ -3004,25 +3008,36 @@ static void mwifiex_cleanup_pcie(struct mwifiex_adapter *adapter) int ret; u32 fw_status; + pr_alert("DEBUG: mwifiex_cleanup_pcie starting...\n"); + cancel_work_sync(&card->work); ret = mwifiex_read_reg(adapter, reg->fw_status, &fw_status); if (fw_status == FIRMWARE_READY_PCIE) { + pr_alert("DEBUG: Passed %s %d \n",__FUNCTION__,__LINE__); mwifiex_dbg(adapter, INFO, "Clearing driver ready signature\n"); if (mwifiex_write_reg(adapter, reg->drv_rdy, 0x00000000)) + pr_alert("DEBUG: Passed %s %d \n",__FUNCTION__,__LINE__); mwifiex_dbg(adapter, ERROR, "Failed to write driver not-ready signature\n"); } pci_disable_device(pdev); + pr_alert("DEBUG: Passed %s %d \n",__FUNCTION__,__LINE__); + pci_iounmap(pdev, card->pci_mmap); + pr_alert("DEBUG: Passed %s %d \n",__FUNCTION__,__LINE__); pci_iounmap(pdev, card->pci_mmap1); + pr_alert("DEBUG: Passed %s %d \n",__FUNCTION__,__LINE__); pci_release_region(pdev, 2); + pr_alert("DEBUG: Passed %s %d \n",__FUNCTION__,__LINE__); pci_release_region(pdev, 0); + pr_alert("DEBUG: Passed %s %d \n",__FUNCTION__,__LINE__); mwifiex_pcie_free_buffers(adapter); + pr_alert("DEBUG: Passed %s %d \n",__FUNCTION__,__LINE__); } static int mwifiex_pcie_request_irq(struct mwifiex_adapter *adapter) @@ -3143,6 +3158,8 @@ static int mwifiex_register_dev(struct mwifiex_adapter *adapter) { struct pcie_service_card *card = adapter->card; + pr_alert("DEBUG: mwifiex_register_dev called\n"); + /* save adapter pointer in card */ card->adapter = adapter; @@ -3170,6 +3187,8 @@ static void mwifiex_unregister_dev(struct mwifiex_adapter *adapter) struct pci_dev *pdev = card->dev; int i; + pr_alert("DEBUG: mwifiex_unregister_dev called\n"); + if (card->msix_enable) { for (i = 0; i < MWIFIEX_NUM_MSIX_VECTORS; i++) synchronize_irq(card->msix_entries[i].vector); @@ -3200,6 +3219,8 @@ static void mwifiex_pcie_up_dev(struct mwifiex_adapter *adapter) struct pcie_service_card *card = adapter->card; struct pci_dev *pdev = card->dev; + pr_alert("DEBUG: mwifiex_pcie_up_dev called\n"); + /* tx_buf_size might be changed to 3584 by firmware during * data transfer, we should reset it to default size. */ @@ -3217,6 +3238,8 @@ static void mwifiex_pcie_down_dev(struct mwifiex_adapter *adapter) const struct mwifiex_pcie_card_reg *reg = card->pcie.reg; struct pci_dev *pdev = card->dev; + pr_alert("DEBUG: mwifiex_pcie_down_dev called\n"); + if (mwifiex_write_reg(adapter, reg->drv_rdy, 0x00000000)) mwifiex_dbg(adapter, ERROR, "Failed to write driver not-ready signature\n"); diff --git a/drivers/net/wireless/marvell/mwifiex/scan.c b/drivers/net/wireless/marvell/mwifiex/scan.c index e2786ab61..e7d59a493 100644 --- a/drivers/net/wireless/marvell/mwifiex/scan.c +++ b/drivers/net/wireless/marvell/mwifiex/scan.c @@ -1999,6 +1999,8 @@ static void mwifiex_check_next_scan_command(struct mwifiex_private *priv) struct cmd_ctrl_node *cmd_node; unsigned long flags; + pr_alert("DEBUG: mwifiex_check_next_scan_command starting...\n"); + spin_lock_irqsave(&adapter->scan_pending_q_lock, flags); if (list_empty(&adapter->scan_pending_q)) { spin_unlock_irqrestore(&adapter->scan_pending_q_lock, flags); @@ -2020,9 +2022,15 @@ static void mwifiex_check_next_scan_command(struct mwifiex_private *priv) mwifiex_dbg(adapter, INFO, "info: notifying scan done\n"); cfg80211_scan_done(priv->scan_request, &info); + pr_info("DEBUG: Passed %s %d \n",__FUNCTION__,__LINE__); + pr_alert("DEBUG: mwifiex_check_next_scan_command: setting priv->scan_request = NULL\n"); priv->scan_request = NULL; + pr_info("DEBUG: Passed %s %d \n",__FUNCTION__,__LINE__); + pr_alert("DEBUG: mwifiex_check_next_scan_command: setting priv->scan_aborting = false\n"); priv->scan_aborting = false; } else { + pr_info("DEBUG: Passed %s %d \n",__FUNCTION__,__LINE__); + pr_alert("DEBUG: mwifiex_check_next_scan_command: setting priv->scan_aborting = false\n"); priv->scan_aborting = false; mwifiex_dbg(adapter, INFO, "info: scan already aborted\n"); @@ -2046,9 +2054,15 @@ static void mwifiex_check_next_scan_command(struct mwifiex_private *priv) mwifiex_dbg(adapter, INFO, "info: aborting scan\n"); cfg80211_scan_done(priv->scan_request, &info); + pr_info("DEBUG: Passed %s %d \n",__FUNCTION__,__LINE__); + pr_alert("DEBUG: mwifiex_check_next_scan_command: setting priv->scan_request = NULL\n"); priv->scan_request = NULL; + pr_info("DEBUG: Passed %s %d \n",__FUNCTION__,__LINE__); + pr_alert("DEBUG: mwifiex_check_next_scan_command: setting priv->scan_aborting = false\n"); priv->scan_aborting = false; } else { + pr_info("DEBUG: Passed %s %d \n",__FUNCTION__,__LINE__); + pr_alert("DEBUG: mwifiex_check_next_scan_command: setting priv->scan_aborting = false\n"); priv->scan_aborting = false; mwifiex_dbg(adapter, INFO, "info: scan already aborted\n"); @@ -2092,7 +2106,9 @@ void mwifiex_cancel_scan(struct mwifiex_adapter *adapter) mwifiex_dbg(adapter, INFO, "info: aborting scan\n"); cfg80211_scan_done(priv->scan_request, &info); + pr_alert("DEBUG: mwifiex_cancel_scan: setting priv->scan_request = NULL\n"); priv->scan_request = NULL; + pr_alert("DEBUG: mwifiex_cancel_scan: setting priv->scan_aborting = false\n"); priv->scan_aborting = false; } } @@ -2139,6 +2155,8 @@ int mwifiex_ret_802_11_scan(struct mwifiex_private *priv, struct cfg80211_wowlan_nd_match *pmatch; struct cfg80211_sched_scan_request *nd_config = NULL; + pr_alert("DEBUG: %s starting... \n",__FUNCTION__); + is_bgscan_resp = (le16_to_cpu(resp->command) == HostCmd_CMD_802_11_BG_SCAN_QUERY); if (is_bgscan_resp) @@ -2146,7 +2164,6 @@ int mwifiex_ret_802_11_scan(struct mwifiex_private *priv, else scan_rsp = &resp->params.scan_resp; - if (scan_rsp->number_of_sets > MWIFIEX_MAX_AP) { mwifiex_dbg(adapter, ERROR, "SCAN_RESP: too many AP returned (%d)\n", @@ -2155,6 +2172,8 @@ int mwifiex_ret_802_11_scan(struct mwifiex_private *priv, goto check_next_scan; } + pr_alert("DEBUG: Passed %s %d \n",__FUNCTION__,__LINE__); + /* Check csa channel expiry before parsing scan response */ mwifiex_11h_get_csa_closed_channel(priv); @@ -2250,6 +2269,7 @@ int mwifiex_ret_802_11_scan(struct mwifiex_private *priv, &bytes_left, le64_to_cpu(fw_tsf), radio_type, false, 0); + pr_alert("DEBUG: Passed %s %d \n",__FUNCTION__,__LINE__); if (ret) goto check_next_scan; } @@ -2642,6 +2662,8 @@ int mwifiex_handle_event_ext_scan_report(struct mwifiex_private *priv, u8 *scan_resp = buf + sizeof(struct mwifiex_event_scan_result); u16 scan_resp_size = le16_to_cpu(event_scan->buf_size); + pr_alert("DEBUG: %s starting...\n",__FUNCTION__); + if (num_of_set > MWIFIEX_MAX_AP) { mwifiex_dbg(adapter, ERROR, "EXT_SCAN: Invalid number of AP returned (%d)!!\n", @@ -2650,6 +2672,8 @@ int mwifiex_handle_event_ext_scan_report(struct mwifiex_private *priv, goto check_next_scan; } + pr_alert("DEBUG: Passed %s %d \n",__FUNCTION__,__LINE__); + bytes_left = scan_resp_size; mwifiex_dbg(adapter, INFO, "EXT_SCAN: size %d, returned %d APs...", @@ -2743,6 +2767,7 @@ int mwifiex_handle_event_ext_scan_report(struct mwifiex_private *priv, ret = mwifiex_parse_single_response_buf(priv, &bss_info, &bytes_left, fw_tsf, radio_type, true, rssi); + pr_alert("DEBUG: Passed %s %d \n",__FUNCTION__,__LINE__); if (ret) goto check_next_scan; } -- 2.22.0 ```
journal -f
log after suspend: cmd: Scan already in process..
These messages will appear repeatedly if I do not reload wifi modules in systemd/system-sleep/sleep script:
wpa_supplicant[1049]: wlp3s0: CTRL-EVENT-SCAN-FAILED ret=-16
kernel: mwifiex_pcie 0000:03:00.0: info: received scan request on wlp3s0
kernel: mwifiex: DEBUG: priv->scan_aborting is true!
kernel: mwifiex_pcie 0000:03:00.0: cmd: Scan already in process..
It seems that priv->scan_aborting
is preventing wifi from working.
```c /* Block scan request if scan operation or scan cleanup when interface * is disabled is in process */ if (priv->scan_request || priv->scan_aborting) { if (priv->scan_request) pr_alert("DEBUG: priv->scan_request is true!\n"); // debug print by me if (priv->scan_aborting) pr_alert("DEBUG: priv->scan_aborting is true!\n"); // debug print by me mwifiex_dbg(priv->adapter, WARN, "cmd: Scan already in process..\n"); return -EBUSY; } ```
The priv->scan_aborting
will be set in mwifiex_close()
[mwifiex/main.c]:
```c /* * CFG802.11 network device handler for close. */ static int mwifiex_close(struct net_device *dev) { struct mwifiex_private *priv = mwifiex_netdev_get_priv(dev); pr_alert("DEBUG: mwifiex_close\n"); // debug print by me if (priv->scan_request) { struct cfg80211_scan_info info = { .aborted = true, }; mwifiex_dbg(priv->adapter, INFO, "aborting scan on ndo_stop\n"); cfg80211_scan_done(priv->scan_request, &info); pr_alert("DEBUG: mwifiex_close: setting priv->scan_request = NULL\n"); // debug print by me priv->scan_request = NULL; pr_alert("DEBUG: mwifiex_close: setting priv->scan_aborting = true\n"); // debug print by me priv->scan_aborting = true; } if (priv->sched_scanning) { mwifiex_dbg(priv->adapter, INFO, "aborting bgscan on ndo_stop\n"); mwifiex_stop_bg_scan(priv); cfg80211_sched_scan_stopped(priv->wdev.wiphy, 0); } return 0; } ```
If I comment out the line priv->scan_aborting = true;
, (EDIT: 2019-08-18 Added debug print. change pr_alert to whatever you like)
```diff From 13355789aa04773ab738d5cff1d74f5be4009089 Mon Sep 17 00:00:00 2001 From: kitakar5525 <34676735+kitakar5525@users.noreply.github.com> Date: Sun, 18 Aug 2019 19:22:11 +0900 Subject: [PATCH] mwifiex: do not set scan_aborting --- drivers/net/wireless/marvell/mwifiex/main.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/net/wireless/marvell/mwifiex/main.c b/drivers/net/wireless/marvell/mwifiex/main.c index 0f29d1317..8c92663e0 100644 --- a/drivers/net/wireless/marvell/mwifiex/main.c +++ b/drivers/net/wireless/marvell/mwifiex/main.c @@ -761,7 +761,8 @@ mwifiex_close(struct net_device *dev) "aborting scan on ndo_stop\n"); cfg80211_scan_done(priv->scan_request, &info); priv->scan_request = NULL; - priv->scan_aborting = true; + pr_alert("DEBUG: Not setting priv->scan_aborting = true;\n"); + // priv->scan_aborting = true; } if (priv->sched_scanning) { -- 2.22.1 ```
wifi is still working after suspend.
I'm still investigating
priv->scan_aborting
will not be set to falsepriv->scan_aborting
there (still working even if commented out)Refused to change power state, currently in D3
after module reloadingmwifiex_pcie 0000:03:00.0: Refused to change power state, currently in D3
and mwifiex module will output similar messages as @tmarkov reported.
If I put wifi into D3hot on removing the module, I can reload the module and wifi is still working. (EDIT: 2019-08-18 Added debug print. change pr_alert to whatever you like)
```diff From 741fb3bcc81c000668e3c80e18986c77d5b807f2 Mon Sep 17 00:00:00 2001 From: kitakar5525 <34676735+kitakar5525@users.noreply.github.com> Date: Sun, 18 Aug 2019 19:26:41 +0900 Subject: [PATCH] mwifiex_pcie: set to d3hot on removing --- drivers/net/wireless/marvell/mwifiex/pcie.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/net/wireless/marvell/mwifiex/pcie.c b/drivers/net/wireless/marvell/mwifiex/pcie.c index 6a28d89ec..f34cbf242 100644 --- a/drivers/net/wireless/marvell/mwifiex/pcie.c +++ b/drivers/net/wireless/marvell/mwifiex/pcie.c @@ -314,6 +314,9 @@ static void mwifiex_pcie_remove(struct pci_dev *pdev) mwifiex_init_shutdown_fw(priv, MWIFIEX_FUNC_SHUTDOWN); } + pr_alert("DEBUG: putting into D3hot...\n"); + pci_set_power_state(pdev, PCI_D3hot); + mwifiex_remove_card(adapter); } -- 2.22.1 ```
...I'm not sure why this change works.
On 4.19, I feel we need to explicitly put wifi device into D0 state on resume from suspend.
```diff
From 9c9605de53468be901f7465c481829660d1143be Mon Sep 17 00:00:00 2001
From: kitakar5525 <34676735+kitakar5525@users.noreply.github.com>
Date: Wed, 14 Aug 2019 09:28:58 +0900
Subject: [PATCH] mwifiex_pcie: put into D0 explicitly on resume
---
drivers/net/wireless/marvell/mwifiex/pcie.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/drivers/net/wireless/marvell/mwifiex/pcie.c b/drivers/net/wireless/marvell/mwifiex/pcie.c
index 2bf067138..6eee79d66 100644
--- a/drivers/net/wireless/marvell/mwifiex/pcie.c
+++ b/drivers/net/wireless/marvell/mwifiex/pcie.c
@@ -18,6 +18,7 @@
*/
#include
I'm also struggling with this problem; when I start up my surface pro4 machine, it works completely fine, but wi-fi gets missed after suspending (or hybernating??? by closing the lid, pushing power button, or pressing Super+L key).
The suggestions above (@tmarkov's sleep script, @ezyang's manual modprobe, @nkkollaw's 4 lines script) didn't work for my environment.
Surface Pro4 Centos 7 kernel 5.1.21 (downloaded from https://www.kernel.org/) used patches v5.1 (https://github.com/jakeday/linux-surface/tree/master/patches/5.1)
How can I fix this? Just hooking reloading wifi modules would satisfy me.
Also, I tried the following procedure, although the result was the same problem...
## Download kernel source
# https://www.kernel.org/
cd ~
mkdir kernel-sources
tar xf ~/Downloads/linux-5.2.9.tar.xz # 1
git clone https://github.com/jakeday/linux-surface # 2
git clone https://github.com/kitakar5525/linux-surface-patches # 3
cd linux-surface
sudo sh setup.sh
reboot
cd ~/kernel-sources/linux-5.2.9
emacs Makefile # EXTRAVERSION = -sp4
make oldconfig
for i in ../linux-surface-patches/patch-5.2/0000-jakeday/*.patch; do patch -p1 < $i; done
for i in ../linux-surface-patches/patch-5.2/5525-wifi/*.patch; do patch -p1 < $i; done
for i in ../linux-surface-patches/patch-5.2/5529-debug/*.patch; do patch -p1 < $i; done
for i in ../linux-surface-patches/patch-5.2/5529-mwifiex-debug/*.patch; do patch -p1 < $i; done
cp ../linux-surface/configs/5.1/config .config
make -j4 bzImage
make -j4 modules
sudo make -j4 modules_install
sudo make -j4 install
@ywatanabe1989
First, thank you for testing on SP4! It seems that you applied all the three patches I described above.
I'd like to see dmesg -x
log between suspend.
EDIT: The whole log is preferable but if you want, from PM: suspend entry (s2idle)
to last mwifiex_pcie
output is sufficient.
@kitakar5525 No, thank YOU! I tried your scripts though I knew your scripts targeted other surface products and I was not sure the way to apply it was appropriate. :o
This is a whole log from boot to after recovering from suspension. dmesg.txt (deleted because this included evbug)
Thank you in advance. I am not familiar with even logging systems, to say nothing of kernel's functioning.
Oh, I recommend disabling evbug (https://github.com/jakeday/linux-surface/issues/350#issuecomment-453874391) and post a log again. It's including a lot of (usually unnecessary) input devices log.
And it seems that your device actually entered hibernate. I guess you replaced suspend with hibernate on jakeday's setup.sh. If you want to suspend, undo the change.
Also, I'll test if the patches work also on hibernate on my SB1.
Thank you very much!! 1) I disabled evbug. 2) I switched from hibernation to suspension. First, I retried jakeday's setup.sh and answered "yes" when asked to use suspend. Second, just to be sure, I executed the following link commands on the page you recommended.
sudo ln -sfb /lib/systemd/system/hibernate.target /etc/systemd/system/suspend.target && sudo ln -sfb /lib/systemd/system/systemd-hibernate.service /etc/systemd/system/systemd-suspend.service
This is the log (dmesg_2.txt)). It seems that hibernation is used yet. I'm going to reinstall a kernel.
The command to undo is rather this one!
rm /etc/systemd/system/suspend.target && /etc/systemd/system/systemd-suspend.service
The log still shows that your device entered hibernate (PM: hibernation entry
)
(I recommend to remove the first log which contains evbug just in case!)
Unfortunately, the patches did not work for hibernate also on my SB1.
dmesg_3.txt This may be the one! Thanks to you, I finally changed to suspend instead of hibernating.
Then, I think you still reload modules in /lib/systemd/system-sleep/sleep if you do nothing after running setup.sh.
Try commenting out all the lines and I'd like to know if the wifi is still working after suspend on SP4.
After commenting out all the lines in /lib/systemd/system-sleep/sleep, the wifi module got to survivie the suspend!! However, the wi-fi password was required again, and sadly even the accurate password (although pressed carefully several times) didn't satisfy the requirement.
dmesg_4.txt 'WINAS-H33-PC' is my SSID. It seems that once after passing the authentification process, the connection got lost because of "reason code 15".
kern :info : [ 27.542157] mwifiex_pcie 0000:02:00.0: info: trying to associate to 'WINAS-H33-PC' bssid 34:76:c5:80:65:c6
kern :info : [ 27.569764] mwifiex_pcie 0000:02:00.0: info: associated to bssid 34:76:c5:80:65:c6 successfully
kern :info : [ 34.218078] mwifiex_pcie 0000:02:00.0: info: successfully disconnected from 34:76:c5:80:65:c6: reason code 15
reason code 15
The reason code 15
means 4way handshake timeout [include/linux/ieee80211.h] :
/* Reason codes */
enum ieee80211_reasoncode {
[...]
WLAN_REASON_4WAY_HANDSHAKE_TIMEOUT = 15,
[...]
};
This issue is being tracked at Surface Pro 6 not connecting to WiFi access points with authentication after suspend, but works with open AP. · Issue #348 · jakeday/linux-surface
FW in reset state
before entering suspend (SP4, Surface 3)Also, your log shows that FW in reset state
and the wifi card is removed
before entering suspend.
kern :info : [ 17.292488] mwifiex_pcie 0000:02:00.0: info: successfully disconnected from 34:76:c5:80:65:c6: reason code 3
kern :info : [ 17.320456] mwifiex: DEBUG: Not setting priv->scan_aborting = true;
[...]
kern :info : [ 17.625530] mwifiex_pcie 0000:02:00.0: DNLD_CMD: FW in reset state, ignore cmd 0x107
kern :info : [ 17.625574] mwifiex_pcie: DEBUG: putting into D3hot...
kern :info : [ 17.643967] mwifiex_pcie 0000:02:00.0: info: shutdown mwifiex...
kern :info : [ 17.644135] mwifiex_pcie 0000:02:00.0: PREP_CMD: card is removed
[...]
kern :info : [ 18.851903] PM: suspend entry (s2idle)
(Shown with DEBUG:
is what my patches described above are doing.)
Wifi card remove will not happen on my SB1 but my Surface 3 (non-pro) indeed has this problem (authentication failure will not happen, though). I should look into this issue closer now…
Thank you for testing! Until we can fix this, I think reloading modules in system-sleep/sleep script works at least on suspend.
Lastly, the old dmesg log link is still alive in edit history! Click edited
, click old edit log, click Options
, then click Delete revision from history
to completely remove the link. Just in case the log has too much info.
Thank you for your dedicated, detailed, and kind help!!
Thanks to you, I found my mistake. I have put scripts named hibernate1.sh and hibernate2.sh under /lib/systemd/system-sleep/.
I'll try to examine them.
hibernate1.sh
#!/bin/sh
case $1/$2 in
pre/*)
systemctl stop NetworkManager
rmmod intel_ipts
rmmod mei_me
rmmod mei
rmmod mwifiex_pcie
rmmod mwifiex
echo 1 > /sys/bus/pci/devices/0000\:02\:00.0/reset
;;
post/*)
modprobe mei
modprobe mei_me
echo 1 > /sys/bus/pci/devices/0000\:02\:00.0/reset
modprobe mwifiex
modprobe mwifiex_pcie
systemctl start NetworkManager
;;
esac
hibernate2.sh
#!/bin/bash
case $1 in
pre)
# unload the modules before going to sleep
modprobe -r intel_ipts
modprobe -r mei_me
modprobe -r mei
modprobe -r mwifiex_pcie;
modprobe -r mwifiex;
modprobe -r cfg80211;
;;
post)
while [ $(cat /proc/acpi/button/lid/LID0/state | grep closed | wc -l) = 1 ]
do
echo $(date) >> /var/log/resuspend
echo freeze > /sys/power/state
done
modprobe -i intel_ipts
modprobe -i mei_me
modprobe -i cfg80211;
modprobe -i mwifiex;
modprobe -i mwifiex_pcie;
;;
esac
@ywatanabe1989
I noticed that the command to undo replacing suspend with hibernate in wiki has typo.
Maybe the second link /etc/systemd/system/systemd-suspend.service
still remains there If you run it as it is. Because your device now enters suspend, I guess you already deleted the two files, though.
The command is rather this one:
# Added `sudo` and removed `&&`
sudo rm /etc/systemd/system/suspend.target /etc/systemd/system/systemd-suspend.service
I'll update the wiki.
I have put scripts named hibernate1.sh and hibernate2.sh under /lib/systemd/system-sleep/.
You mean you have the two script files at the same time? then, both the two files will be executed. You need only one file. Also, if you use patches for Linux 5.2, you can refer to the latest system-sleep/sleep script here
I might find the cause of wifi dropping after suspend (see https://github.com/jakeday/linux-surface/issues/554#issuecomment-525262259)
Resolves
cmd: Scan already in process..
(https://github.com/jakeday/linux-surface/issues/420#issuecomment-520916827)Refused to change power state, currently in D3
(https://github.com/jakeday/linux-surface/issues/420#issuecomment-520918835 and https://github.com/jakeday/linux-surface/issues/420#issuecomment-522310097)I confirmed this change does not break S0ix on Surface 3 (non-pro).
But as a side effect, it warns that the parent bridge is in D3hot on SB1. I'm not sure why.
kern :warn : [ 112.777812] acpi device:4b: Cannot transition to power state D0 for parent in D3hot
```diff From 2350261cc0a24addd938ec6b665768ea1db05d42 Mon Sep 17 00:00:00 2001 From: kitakar5525 <34676735+kitakar5525@users.noreply.github.com> Date: Mon, 27 Aug 2019 19:04:56 +0900 Subject: [PATCH] mwifiex_pcie: disable parent bridge_d3 --- drivers/net/wireless/marvell/mwifiex/pcie.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/drivers/net/wireless/marvell/mwifiex/pcie.c b/drivers/net/wireless/marvell/mwifiex/pcie.c index 6e734a83e..a94fd0e8e 100644 --- a/drivers/net/wireless/marvell/mwifiex/pcie.c +++ b/drivers/net/wireless/marvell/mwifiex/pcie.c @@ -233,8 +233,12 @@ static int mwifiex_pcie_probe(struct pci_dev *pdev, const struct pci_device_id *ent) { struct pcie_service_card *card; + struct pci_dev *parent_pdev = pci_upstream_bridge(pdev); int ret; + pr_alert("DEBUG: disabling bridge_d3...\n"); + parent_pdev->bridge_d3 = false; + pr_debug("info: vendor=0x%4.04X device=0x%4.04X rev=%d\n", pdev->vendor, pdev->device, pdev->revision); -- 2.23.0 ```
Not sure wifi is still working when FW in reset state
before entering suspend happened on SP4.
I can confirm that with your patch applied, the WiFi seems to reliably survive suspending the surface, even without the systemd-sleep script.
However, it seems to break S0ix for me on my SB2. I only applied your last patch, so if I should try any other patches, please let me know.
it seems to break S0ix for me on my SB2.
I see… On SB1, only the last patch mwifiex_pcie-disable-parent-bridge_d3.patch
is enough.
Currently, I have no idea. Maybe the platform-wide fix is difficult then.
Confirmation that pci_set_power_state(pdev, PCI_D3hot);
works on Surface Pro 2017. I haven't lost WiFi after suspending since applying it. Thanks @kitakar5525
Hi @xeiraex, have you tried the patch mwifiex_pcie-disable-parent-bridge_d3.patch
(https://github.com/jakeday/linux-surface/issues/420#issuecomment-525276319)?
We can check if disabling wifi parent bridge D3 state will fix this issue (without any patch) by adding kernel parameter pcie_port_pm=off
to our bootloader.
@StollD Can you test with the parameter added to see if it will break S0ix states or not?
The parameter will disable all bridge's D3 state. I hope adding that parameter has no impact on suspend power consumption on all surface devices except SB series.
SB series has bridge for dGPU (I think even on non-dGPU model). On Windows, I found Windows will put the bridge and dGPU itself into D3hot state on suspend (at least on SB1, see SoC Watch output https://github.com/jakeday/linux-surface/issues/554#issuecomment-525259174). So, maybe disabling all bridge's D3 state will cause more power consumption on suspend.
@StollD Can you test with the parameter added to see if it will break S0ix states or not?
Added the parameter to the kernel commandline and same result: If I don't keep the sleep script, S0ix fails. It works only when the wifi driver gets unloaded before suspending.
It works only when the wifi driver gets unloaded before suspending.
OK. Thank you.
@jonas2515 @StollD and everyone
Can anyone please test with all of these patches to see if wifi is still working after several suspend while not breking S0ix states?
0001-mwifiex_pcie-disable-parent-bridge_d3.patch.txt 0002-mwifiex_pcie-put-into-D3hot-on-suspend.patch.txt 0003-mwifiex_pcie-put-into-D0-on-resume.patch.txt 0004-mwifiex-do-not-set-scan_aborting.patch.txt (I posted all of these patches before)
This time, not only disabled the parent bridge_d3 but also put the wifi device into D3hot before suspend. As a side effect, it seems that we need to avoid setting scan_aborting flag to see APs after suspend.
Same result as before: If I disable the sleep script, the wifi still survives the suspend, but the surface doesn't enter S0ix anymore. I can only get S0ix with the sleep script unloading the modules.
Thank you for testing!
I want to confirm that, under the following conditions:
your SB2 does not enter S0ix in this case, too, right? If your SB2 does not enter S0ix in this case, too, the cause of S0ix failure might be other than disabling bridge_d3 (?)
I'm not yet sure the cause of S0ix failure is my patches or not.
Yeah, sorry, your patches were never the reason for S0ix failing. Should have clarified that before. I get the same behaviour without these patches and the disabled sleep script.
OK, thank you!
Then, It means that the most difficult part (S0ix) still remains…
What will happen when you skip the first patch (disabling bridge_d3)? Namely, please try with these three patches this time
To summarize, on my SB1, at least for wifi to be working after several suspend:
or
are needed.
Hmm… the combination
seems to be not stable. It causes Firmware wakeup failed
.
and the combination
caused cmd_timeout
then RIP: 0010:dev_watchdog+0x26a/0x280
after several suspend and some usage.
So, after all, I feel that disabling bridge_d3 is the most stable way just against dropping the wifi after suspend (I don't know what we can do about S0ix yet)
EDIT:
With only 0001-mwifiex_pcie-disable-parent-bridge_d3.patch.txt
, I remember it worked before. Now it also fails to show APs after suspend (it now requires 0004-mwifiex-do-not-set-scan_aborting.patch.txt
). Bluetooth might be related… (?)
@StollD It would be interesting to know which module is preventing S0ix, could you please test if S0ix can be achieved by removing only the mwifiex_pcie module ?
@StollD It would be interesting to know which module is preventing S0ix, could you please test if S0ix can be achieved by removing only the mwifiex_pcie module ?
No, I have to unload both, mwifiex
and mwifiex_pcie
to get S0ix to work
@StollD Thank you for your quick answer.
Would you mind trying the below patch when you have some time ? (without any other wifi patch and sleep script) https://gist.github.com/sebanc/ad488e220c97c369be0d5b627b7ebfb9#file-mwifiex_s0ix_test-patch
Note: you will have 2 warnings when compiling as I left the original suspend/resume functions which are now unused.
@sebanc That seems to work fine! Wifi doesn't crash after suspend, and the surface reaches S0ix, without having to unload the modules.
I will do a longer test and report back later, because I think sometimes the wifi doesn't come back properly if the surface was suspended for a longer time, even with the sleep script reloading it.
@sebanc Used it for a full day, with multiple 30 min suspends, one 2h suspend and one suspend overnight, same result: Stable Wifi + S0ix
@StollD Thank you for testing it, I was not able to verify it on my SP4 as it does not go to S0ix. I only noticed that the battery consumption during suspend decreased a lot after this patch.
I also tried the patch without any other wifi patch and sleep script
on SB1, I confirmed that
/sys/devices/system/cpu/cpuidle/low_power_idle_cpu_residency_us
) (SB1 cannot go to S0ix, too)I also tried on Surface 3 (non-pro, Intel Atom):
Nice work!
What I noticed:
acpi device:4b: Cannot transition to power state D0 for parent in D3hot
(caused by disabling bridge_d3)EDIT:
The device:4b
is something related to \_SB_.PCI0.RP12.PXSX
(wifi device on SB1)
$ cat /sys/bus/acpi/devices/device:4b/path
\_SB_.PCI0.RP12.PXSX
@kitakar5525 It seems the suspend battery consumption improvement only applies to SP4 (m3 model only ?) which had an additional bug. Like your SB1, it goes to PC10 and not in S0ix. However, without this patch it is loosing 3.5% battery per hour in suspend (tested multiple times), now it is about the same as yours (1.2%).
a lot of messages on dmesg acpi device:4b: Cannot transition to power state D0 for parent in D3hot (caused by disabling bridge_d3).
I am not sure if it is really an issue as I also have it on the NVME pcie root port since the 5.3 kernel patch has been applied.
when I unload mwifiex modules before suspend, it cannot go to even PC10
My device seems to go to PC10 even if mwifiex modules are unloaded. In theory, if the mwifiex modules are unloaded before suspend, the modified suspend/resume functions should not be called. The issue might be related to the "sta_cmd.c" file patch (old patch I used but that might not be necessary anymore), could you try again after reverting it ?
acpi device:4b: Cannot transition to power state D0 for parent in D3hot
I am not sure if it is really an issue as I also have it on the NVME pcie root port since the 5.3 kernel patch has been applied.
acpi device:44: Cannot transition to power state D0 for parent in D3hot
on SB1), the message will be displayed only once per suspend and be displayed on the distribution's kernel, too.mwifiex_pcie_suspend2
and mwifiex_pcie_resume2
It seems that the function mwifiex_pcie_suspend2
is from mwifiex_pcie_remove
and the function mwifiex_pcie_resume2
is largely from mwifiex_pcie_probe
. So, I think that it behaves in the same way as we unload modules before suspend and re-load modules after suspend.
kern :info : [ 1998.866718] mwifiex_pcie 0000:03:00.0: info: shutdown mwifiex...
kern :info : [ 1998.875591] mwifiex_pcie 0000:03:00.0: PREP_CMD: card is removed
kern :info : [ 2002.216265] mwifiex_pcie 0000:03:00.0: Refused to change power state, currently in D3
kern :err : [ 2002.226480] mwifiex_pcie 0000:03:00.0: invalid power transition (from state 4 to 3)
The issue might be related to the "sta_cmd.c" file patch
sta_cmd.c
seems to be not necessary for wifi to be working after several suspend and for achieving at least PC10.Thank you that's very clear.
Regarding the warning Cannot transition to power state D0 for parent in D3hot
, it only appears once for me (as for nvme). I will try to understand what it's related to, else we can just remove it.
@sebanc Sorry, haven't been following this issue. Looking over WiFi issues right now. If I understand you all correctly, the patch fixes the resume issues? If so, do you want to send me a PR to https://github.com/qzed/linux-surface-kernel? (If the original wifi patches are not needed, just revert them, e.g. git revert 367d6a9191b6e24ddf8e6432933ebf7f22127e97
on 5.3-surface-devel).
@qzed
Considering this,
mwifiex_pcie_suspend2
andmwifiex_pcie_resume2
It seems that the function
mwifiex_pcie_suspend2
is frommwifiex_pcie_remove
and the functionmwifiex_pcie_resume2
is largely frommwifiex_pcie_probe
. So, I think that it behaves in the same way as we unload modules before suspend and re-load modules after suspend.
Does it really make sense to put that patch in the kernel rather than use the sleep script?
@tmarkov Right, sorry. I missed that. As long as the sleep script works let's keep it at that then.
On my Surface 3 (non-pro), I disable ASPM L1 state (L0s is still enabled) for daily usage (jakeday/linux-surface#456).
Disabling L1 state will break S0ix states on suspend. If I understand correctly, the patch will do almost the same thing as unloading/loading mwifiex_pcie module.
What I'm now confused about is, when disabling the L1 state, the only way to achieve S0ix is to apply the patch. Unloading mwifiex modules will not fix S0ix🤔
So, I personally use the patch.
acpi device:4b: Cannot transition to power state D0 for parent in D3hot
on SB1I mentioned before that the message will appear multiple times during suspend and it's annoying. I still cannot find what is happening. I want to suppress the message but I'm fine with that for now.
This is a problem that wasn't happening... until it suddenly started happening though no config changes on my side. But twice today few hours (out of 5-6 suspends), I didn't have Wi-Fi after wakeup. Network manager was showing nothing, and I couldn't get it back up with restarting NM, echo 1 > /sys/bus/pci/rescan, or removing/reloading mwifiex and mwifiex_pcie.
Here's journal: journal.txt
Last lines of journal seem relevant; perhaps it's fixable with configuration, would appreciate some help is so.
Also, I should note that my sleep script is a bit different from Jakeday's, but it's been working no problem until now: