jakeday / linux-surface

Linux Kernel for Surface Devices
2.59k stars 241 forks source link

Wi-Fi drops after suspend #420

Open tmarkov opened 5 years ago

tmarkov commented 5 years ago

This is a problem that wasn't happening... until it suddenly started happening though no config changes on my side. But twice today few hours (out of 5-6 suspends), I didn't have Wi-Fi after wakeup. Network manager was showing nothing, and I couldn't get it back up with restarting NM, echo 1 > /sys/bus/pci/rescan, or removing/reloading mwifiex and mwifiex_pcie.

Here's journal: journal.txt

Last lines of journal seem relevant; perhaps it's fixable with configuration, would appreciate some help is so.

Also, I should note that my sleep script is a bit different from Jakeday's, but it's been working no problem until now:

#!/bin/bash
case $1 in
  pre)
    # unload the modules before going to sleep
    modprobe -r intel_ipts
    modprobe -r mei_me
    modprobe -r mei
    modprobe -r mwifiex_pcie;
    modprobe -r mwifiex;
    modprobe -r cfg80211;
    ;;
  post)
    while [ $(cat /proc/acpi/button/lid/LID0/state | grep closed | wc -l) = 1 ]
    do
      echo $(date) >> /var/log/resuspend
      echo freeze > /sys/power/state
    done
    modprobe -i intel_ipts
    modprobe -i mei_me
    modprobe -i cfg80211;
    modprobe -i mwifiex;
    modprobe -i mwifiex_pcie;
    ;;
esac
ezyang commented 5 years ago

I have a fairly stock system on Surface Book 2 and this appears to affect me too.

ezyang commented 5 years ago

In my case, I was able to restore wifi by manually reloading modprobe mwifiex_pcie

nkkollaw commented 5 years ago

I have a script that I run whenever that happens.

Its contents are:

sudo modprobe -r mwifiex_pcie
sudo modprobe mwifiex_pcie
sudo service network-manager stop
sudo service network-manager start
Erudition commented 5 years ago

For me, that script work to reload the W-Fi when it was already working.

In the broken state, however, it has no effect.

kitakar5525 commented 5 years ago

debugging wifi issues

you can enable debug output for wifi:

sudo su -c 'echo "file drivers/net/wireless/marvell/mwifiex/* +p" > /sys/kernel/debug/dynamic_debug/control'
sudo su -c "echo 0xffffffff > /sys/kernel/debug/mwifiex/mlan0/debug_mask"

And/or change MWIFIEX_DEFAULT_DEBUG_MASK (to see early events after reloading modules)

mwifiex: change default debug level to any

```diff From 5f18d4edb8ad664900a1ef100e673e1d4e6138ac Mon Sep 17 00:00:00 2001 From: kitakar5525 <34676735+kitakar5525@users.noreply.github.com> Date: Thu, 8 Aug 2019 06:36:03 +0900 Subject: [PATCH] mwifiex: change default debug level to any --- drivers/net/wireless/marvell/mwifiex/main.h | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/drivers/net/wireless/marvell/mwifiex/main.h b/drivers/net/wireless/marvell/mwifiex/main.h index d4027a803..350e8ebdc 100644 --- a/drivers/net/wireless/marvell/mwifiex/main.h +++ b/drivers/net/wireless/marvell/mwifiex/main.h @@ -197,9 +197,7 @@ enum MWIFIEX_DEBUG_LEVEL { MWIFIEX_DBG_ANY = 0xffffffff }; -#define MWIFIEX_DEFAULT_DEBUG_MASK (MWIFIEX_DBG_MSG | \ - MWIFIEX_DBG_FATAL | \ - MWIFIEX_DBG_ERROR) +#define MWIFIEX_DEFAULT_DEBUG_MASK MWIFIEX_DBG_ANY __printf(3, 4) void _mwifiex_dbg(const struct mwifiex_adapter *adapter, int mask, -- 2.22.0 ```

I also added debug print:

mwifiex: print debug

```diff From db47ac27b5c00c89234bfb9ab94ecea91d1db386 Mon Sep 17 00:00:00 2001 From: kitakar5525 <34676735+kitakar5525@users.noreply.github.com> Date: Thu, 8 Aug 2019 07:09:39 +0900 Subject: [PATCH] mwifiex: print debug --- .../net/wireless/marvell/mwifiex/cfg80211.c | 16 +++++++++++ drivers/net/wireless/marvell/mwifiex/main.c | 10 +++++++ drivers/net/wireless/marvell/mwifiex/pcie.c | 23 ++++++++++++++++ drivers/net/wireless/marvell/mwifiex/scan.c | 27 ++++++++++++++++++- 4 files changed, 75 insertions(+), 1 deletion(-) diff --git a/drivers/net/wireless/marvell/mwifiex/cfg80211.c b/drivers/net/wireless/marvell/mwifiex/cfg80211.c index 6c15e8019..a38881a09 100644 --- a/drivers/net/wireless/marvell/mwifiex/cfg80211.c +++ b/drivers/net/wireless/marvell/mwifiex/cfg80211.c @@ -2577,6 +2577,10 @@ mwifiex_cfg80211_scan(struct wiphy *wiphy, * is disabled is in process */ if (priv->scan_request || priv->scan_aborting) { + if (priv->scan_request) + pr_alert("DEBUG: priv->scan_request is true!\n"); + if (priv->scan_aborting) + pr_alert("DEBUG: priv->scan_aborting is true!\n"); mwifiex_dbg(priv->adapter, WARN, "cmd: Scan already in process..\n"); return -EBUSY; @@ -2592,6 +2596,7 @@ mwifiex_cfg80211_scan(struct wiphy *wiphy, if (!user_scan_cfg) return -ENOMEM; + pr_alert("DEBUG: mwifiex_cfg80211_scan: priv->scan_request = request\n"); priv->scan_request = request; if (request->flags & NL80211_SCAN_FLAG_RANDOM_ADDR) { @@ -2645,7 +2650,9 @@ mwifiex_cfg80211_scan(struct wiphy *wiphy, if (ret) { mwifiex_dbg(priv->adapter, ERROR, "scan failed: %d\n", ret); + pr_alert("DEBUG: mwifiex_cfg80211_scan: setting priv->scan_aborting = false\n"); priv->scan_aborting = false; + pr_alert("DEBUG: mwifiex_cfg80211_scan: setting priv->scan_request = NULL\n"); priv->scan_request = NULL; return ret; } @@ -3442,6 +3449,9 @@ static int mwifiex_cfg80211_suspend(struct wiphy *wiphy, struct mwifiex_private *sta_priv = mwifiex_get_priv(adapter, MWIFIEX_BSS_ROLE_STA); + pr_alert("DEBUG: mwifiex_cfg80211_suspend starting...\n"); + + pr_alert("DEBUG: mwifiex_cfg80211_suspend: setting priv->scan_aborting = true\n"); sta_priv->scan_aborting = true; for (i = 0; i < adapter->priv_num; i++) { priv = adapter->priv[i]; @@ -3509,7 +3519,9 @@ static int mwifiex_cfg80211_suspend(struct wiphy *wiphy, mwifiex_dbg(adapter, ERROR, "Failed to set HS params\n"); done: + pr_alert("DEBUG: mwifiex_cfg80211_suspend: setting priv->scan_aborting = false\n"); sta_priv->scan_aborting = false; + pr_alert("DEBUG: mwifiex_cfg80211_suspend done\n"); return ret; } @@ -3522,6 +3534,8 @@ static int mwifiex_cfg80211_resume(struct wiphy *wiphy) int i; bool report_wakeup_reason = true; + pr_alert("DEBUG: mwifiex_cfg80211_resume starting...\n"); + for (i = 0; i < adapter->priv_num; i++) { priv = adapter->priv[i]; if (priv && priv->netdev) @@ -3584,6 +3598,8 @@ static int mwifiex_cfg80211_resume(struct wiphy *wiphy) adapter->nd_info = NULL; } + pr_alert("DEBUG: mwifiex_cfg80211_resume done\n"); + return 0; } diff --git a/drivers/net/wireless/marvell/mwifiex/main.c b/drivers/net/wireless/marvell/mwifiex/main.c index 0f29d1317..1d6557ec2 100644 --- a/drivers/net/wireless/marvell/mwifiex/main.c +++ b/drivers/net/wireless/marvell/mwifiex/main.c @@ -130,6 +130,8 @@ static int mwifiex_unregister(struct mwifiex_adapter *adapter) { s32 i; + pr_alert("DEBUG: mwifiex_free_adapter starting...\n"); + if (adapter->if_ops.cleanup_if) adapter->if_ops.cleanup_if(adapter); @@ -492,6 +494,7 @@ EXPORT_SYMBOL_GPL(mwifiex_main_process); */ static void mwifiex_free_adapter(struct mwifiex_adapter *adapter) { + pr_alert("DEBUG: mwifiex_free_adapter starting...\n"); if (!adapter) { pr_err("%s: adapter is NULL\n", __func__); return; @@ -739,6 +742,8 @@ static int mwifiex_init_hw_fw(struct mwifiex_adapter *adapter, static int mwifiex_open(struct net_device *dev) { + pr_alert("DEBUG: mwifiex_open\n"); + netif_carrier_off(dev); return 0; @@ -752,6 +757,8 @@ mwifiex_close(struct net_device *dev) { struct mwifiex_private *priv = mwifiex_netdev_get_priv(dev); + pr_alert("DEBUG: mwifiex_close\n"); + if (priv->scan_request) { struct cfg80211_scan_info info = { .aborted = true, @@ -760,7 +767,9 @@ mwifiex_close(struct net_device *dev) mwifiex_dbg(priv->adapter, INFO, "aborting scan on ndo_stop\n"); cfg80211_scan_done(priv->scan_request, &info); + pr_alert("DEBUG: mwifiex_close: setting priv->scan_request = NULL\n"); priv->scan_request = NULL; + pr_alert("DEBUG: mwifiex_close: setting priv->scan_aborting = true\n"); priv->scan_aborting = true; } @@ -1741,6 +1750,7 @@ EXPORT_SYMBOL_GPL(mwifiex_add_card); */ int mwifiex_remove_card(struct mwifiex_adapter *adapter) { + pr_alert("DEBUG: mwifiex_remove_card starting...\n"); if (!adapter) return 0; diff --git a/drivers/net/wireless/marvell/mwifiex/pcie.c b/drivers/net/wireless/marvell/mwifiex/pcie.c index 6e734a83e..6a28d89ec 100644 --- a/drivers/net/wireless/marvell/mwifiex/pcie.c +++ b/drivers/net/wireless/marvell/mwifiex/pcie.c @@ -198,6 +198,8 @@ static int mwifiex_pcie_resume(struct device *dev) struct pcie_service_card *card; struct pci_dev *pdev = to_pci_dev(dev); + pr_alert("DEBUG: mwifiex_pcie_resume\n"); + card = pci_get_drvdata(pdev); if (!card->adapter) { @@ -286,6 +288,8 @@ static void mwifiex_pcie_remove(struct pci_dev *pdev) u32 fw_status; int ret; + pr_alert("DEBUG: mwifiex_pcie_remove starting...\n"); + card = pci_get_drvdata(pdev); wait_for_completion(&card->fw_done); @@ -3004,25 +3008,36 @@ static void mwifiex_cleanup_pcie(struct mwifiex_adapter *adapter) int ret; u32 fw_status; + pr_alert("DEBUG: mwifiex_cleanup_pcie starting...\n"); + cancel_work_sync(&card->work); ret = mwifiex_read_reg(adapter, reg->fw_status, &fw_status); if (fw_status == FIRMWARE_READY_PCIE) { + pr_alert("DEBUG: Passed %s %d \n",__FUNCTION__,__LINE__); mwifiex_dbg(adapter, INFO, "Clearing driver ready signature\n"); if (mwifiex_write_reg(adapter, reg->drv_rdy, 0x00000000)) + pr_alert("DEBUG: Passed %s %d \n",__FUNCTION__,__LINE__); mwifiex_dbg(adapter, ERROR, "Failed to write driver not-ready signature\n"); } pci_disable_device(pdev); + pr_alert("DEBUG: Passed %s %d \n",__FUNCTION__,__LINE__); + pci_iounmap(pdev, card->pci_mmap); + pr_alert("DEBUG: Passed %s %d \n",__FUNCTION__,__LINE__); pci_iounmap(pdev, card->pci_mmap1); + pr_alert("DEBUG: Passed %s %d \n",__FUNCTION__,__LINE__); pci_release_region(pdev, 2); + pr_alert("DEBUG: Passed %s %d \n",__FUNCTION__,__LINE__); pci_release_region(pdev, 0); + pr_alert("DEBUG: Passed %s %d \n",__FUNCTION__,__LINE__); mwifiex_pcie_free_buffers(adapter); + pr_alert("DEBUG: Passed %s %d \n",__FUNCTION__,__LINE__); } static int mwifiex_pcie_request_irq(struct mwifiex_adapter *adapter) @@ -3143,6 +3158,8 @@ static int mwifiex_register_dev(struct mwifiex_adapter *adapter) { struct pcie_service_card *card = adapter->card; + pr_alert("DEBUG: mwifiex_register_dev called\n"); + /* save adapter pointer in card */ card->adapter = adapter; @@ -3170,6 +3187,8 @@ static void mwifiex_unregister_dev(struct mwifiex_adapter *adapter) struct pci_dev *pdev = card->dev; int i; + pr_alert("DEBUG: mwifiex_unregister_dev called\n"); + if (card->msix_enable) { for (i = 0; i < MWIFIEX_NUM_MSIX_VECTORS; i++) synchronize_irq(card->msix_entries[i].vector); @@ -3200,6 +3219,8 @@ static void mwifiex_pcie_up_dev(struct mwifiex_adapter *adapter) struct pcie_service_card *card = adapter->card; struct pci_dev *pdev = card->dev; + pr_alert("DEBUG: mwifiex_pcie_up_dev called\n"); + /* tx_buf_size might be changed to 3584 by firmware during * data transfer, we should reset it to default size. */ @@ -3217,6 +3238,8 @@ static void mwifiex_pcie_down_dev(struct mwifiex_adapter *adapter) const struct mwifiex_pcie_card_reg *reg = card->pcie.reg; struct pci_dev *pdev = card->dev; + pr_alert("DEBUG: mwifiex_pcie_down_dev called\n"); + if (mwifiex_write_reg(adapter, reg->drv_rdy, 0x00000000)) mwifiex_dbg(adapter, ERROR, "Failed to write driver not-ready signature\n"); diff --git a/drivers/net/wireless/marvell/mwifiex/scan.c b/drivers/net/wireless/marvell/mwifiex/scan.c index e2786ab61..e7d59a493 100644 --- a/drivers/net/wireless/marvell/mwifiex/scan.c +++ b/drivers/net/wireless/marvell/mwifiex/scan.c @@ -1999,6 +1999,8 @@ static void mwifiex_check_next_scan_command(struct mwifiex_private *priv) struct cmd_ctrl_node *cmd_node; unsigned long flags; + pr_alert("DEBUG: mwifiex_check_next_scan_command starting...\n"); + spin_lock_irqsave(&adapter->scan_pending_q_lock, flags); if (list_empty(&adapter->scan_pending_q)) { spin_unlock_irqrestore(&adapter->scan_pending_q_lock, flags); @@ -2020,9 +2022,15 @@ static void mwifiex_check_next_scan_command(struct mwifiex_private *priv) mwifiex_dbg(adapter, INFO, "info: notifying scan done\n"); cfg80211_scan_done(priv->scan_request, &info); + pr_info("DEBUG: Passed %s %d \n",__FUNCTION__,__LINE__); + pr_alert("DEBUG: mwifiex_check_next_scan_command: setting priv->scan_request = NULL\n"); priv->scan_request = NULL; + pr_info("DEBUG: Passed %s %d \n",__FUNCTION__,__LINE__); + pr_alert("DEBUG: mwifiex_check_next_scan_command: setting priv->scan_aborting = false\n"); priv->scan_aborting = false; } else { + pr_info("DEBUG: Passed %s %d \n",__FUNCTION__,__LINE__); + pr_alert("DEBUG: mwifiex_check_next_scan_command: setting priv->scan_aborting = false\n"); priv->scan_aborting = false; mwifiex_dbg(adapter, INFO, "info: scan already aborted\n"); @@ -2046,9 +2054,15 @@ static void mwifiex_check_next_scan_command(struct mwifiex_private *priv) mwifiex_dbg(adapter, INFO, "info: aborting scan\n"); cfg80211_scan_done(priv->scan_request, &info); + pr_info("DEBUG: Passed %s %d \n",__FUNCTION__,__LINE__); + pr_alert("DEBUG: mwifiex_check_next_scan_command: setting priv->scan_request = NULL\n"); priv->scan_request = NULL; + pr_info("DEBUG: Passed %s %d \n",__FUNCTION__,__LINE__); + pr_alert("DEBUG: mwifiex_check_next_scan_command: setting priv->scan_aborting = false\n"); priv->scan_aborting = false; } else { + pr_info("DEBUG: Passed %s %d \n",__FUNCTION__,__LINE__); + pr_alert("DEBUG: mwifiex_check_next_scan_command: setting priv->scan_aborting = false\n"); priv->scan_aborting = false; mwifiex_dbg(adapter, INFO, "info: scan already aborted\n"); @@ -2092,7 +2106,9 @@ void mwifiex_cancel_scan(struct mwifiex_adapter *adapter) mwifiex_dbg(adapter, INFO, "info: aborting scan\n"); cfg80211_scan_done(priv->scan_request, &info); + pr_alert("DEBUG: mwifiex_cancel_scan: setting priv->scan_request = NULL\n"); priv->scan_request = NULL; + pr_alert("DEBUG: mwifiex_cancel_scan: setting priv->scan_aborting = false\n"); priv->scan_aborting = false; } } @@ -2139,6 +2155,8 @@ int mwifiex_ret_802_11_scan(struct mwifiex_private *priv, struct cfg80211_wowlan_nd_match *pmatch; struct cfg80211_sched_scan_request *nd_config = NULL; + pr_alert("DEBUG: %s starting... \n",__FUNCTION__); + is_bgscan_resp = (le16_to_cpu(resp->command) == HostCmd_CMD_802_11_BG_SCAN_QUERY); if (is_bgscan_resp) @@ -2146,7 +2164,6 @@ int mwifiex_ret_802_11_scan(struct mwifiex_private *priv, else scan_rsp = &resp->params.scan_resp; - if (scan_rsp->number_of_sets > MWIFIEX_MAX_AP) { mwifiex_dbg(adapter, ERROR, "SCAN_RESP: too many AP returned (%d)\n", @@ -2155,6 +2172,8 @@ int mwifiex_ret_802_11_scan(struct mwifiex_private *priv, goto check_next_scan; } + pr_alert("DEBUG: Passed %s %d \n",__FUNCTION__,__LINE__); + /* Check csa channel expiry before parsing scan response */ mwifiex_11h_get_csa_closed_channel(priv); @@ -2250,6 +2269,7 @@ int mwifiex_ret_802_11_scan(struct mwifiex_private *priv, &bytes_left, le64_to_cpu(fw_tsf), radio_type, false, 0); + pr_alert("DEBUG: Passed %s %d \n",__FUNCTION__,__LINE__); if (ret) goto check_next_scan; } @@ -2642,6 +2662,8 @@ int mwifiex_handle_event_ext_scan_report(struct mwifiex_private *priv, u8 *scan_resp = buf + sizeof(struct mwifiex_event_scan_result); u16 scan_resp_size = le16_to_cpu(event_scan->buf_size); + pr_alert("DEBUG: %s starting...\n",__FUNCTION__); + if (num_of_set > MWIFIEX_MAX_AP) { mwifiex_dbg(adapter, ERROR, "EXT_SCAN: Invalid number of AP returned (%d)!!\n", @@ -2650,6 +2672,8 @@ int mwifiex_handle_event_ext_scan_report(struct mwifiex_private *priv, goto check_next_scan; } + pr_alert("DEBUG: Passed %s %d \n",__FUNCTION__,__LINE__); + bytes_left = scan_resp_size; mwifiex_dbg(adapter, INFO, "EXT_SCAN: size %d, returned %d APs...", @@ -2743,6 +2767,7 @@ int mwifiex_handle_event_ext_scan_report(struct mwifiex_private *priv, ret = mwifiex_parse_single_response_buf(priv, &bss_info, &bytes_left, fw_tsf, radio_type, true, rssi); + pr_alert("DEBUG: Passed %s %d \n",__FUNCTION__,__LINE__); if (ret) goto check_next_scan; } -- 2.22.0 ```

kitakar5525 commented 5 years ago

journal -f log after suspend: cmd: Scan already in process..

These messages will appear repeatedly if I do not reload wifi modules in systemd/system-sleep/sleep script:

wpa_supplicant[1049]: wlp3s0: CTRL-EVENT-SCAN-FAILED ret=-16
kernel: mwifiex_pcie 0000:03:00.0: info: received scan request on wlp3s0
kernel: mwifiex: DEBUG: priv->scan_aborting is true!
kernel: mwifiex_pcie 0000:03:00.0: cmd: Scan already in process..

It seems that priv->scan_aborting is preventing wifi from working.

The code in question [mwifiex/cfg80211.c]:

```c /* Block scan request if scan operation or scan cleanup when interface * is disabled is in process */ if (priv->scan_request || priv->scan_aborting) { if (priv->scan_request) pr_alert("DEBUG: priv->scan_request is true!\n"); // debug print by me if (priv->scan_aborting) pr_alert("DEBUG: priv->scan_aborting is true!\n"); // debug print by me mwifiex_dbg(priv->adapter, WARN, "cmd: Scan already in process..\n"); return -EBUSY; } ```

The priv->scan_aborting will be set in mwifiex_close() [mwifiex/main.c]:

`mwifiex_close()` [mwifiex/main.c]:

```c /* * CFG802.11 network device handler for close. */ static int mwifiex_close(struct net_device *dev) { struct mwifiex_private *priv = mwifiex_netdev_get_priv(dev); pr_alert("DEBUG: mwifiex_close\n"); // debug print by me if (priv->scan_request) { struct cfg80211_scan_info info = { .aborted = true, }; mwifiex_dbg(priv->adapter, INFO, "aborting scan on ndo_stop\n"); cfg80211_scan_done(priv->scan_request, &info); pr_alert("DEBUG: mwifiex_close: setting priv->scan_request = NULL\n"); // debug print by me priv->scan_request = NULL; pr_alert("DEBUG: mwifiex_close: setting priv->scan_aborting = true\n"); // debug print by me priv->scan_aborting = true; } if (priv->sched_scanning) { mwifiex_dbg(priv->adapter, INFO, "aborting bgscan on ndo_stop\n"); mwifiex_stop_bg_scan(priv); cfg80211_sched_scan_stopped(priv->wdev.wiphy, 0); } return 0; } ```

If I comment out the line priv->scan_aborting = true;, (EDIT: 2019-08-18 Added debug print. change pr_alert to whatever you like)

mwifiex: do not set scan_aborting

```diff From 13355789aa04773ab738d5cff1d74f5be4009089 Mon Sep 17 00:00:00 2001 From: kitakar5525 <34676735+kitakar5525@users.noreply.github.com> Date: Sun, 18 Aug 2019 19:22:11 +0900 Subject: [PATCH] mwifiex: do not set scan_aborting --- drivers/net/wireless/marvell/mwifiex/main.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/net/wireless/marvell/mwifiex/main.c b/drivers/net/wireless/marvell/mwifiex/main.c index 0f29d1317..8c92663e0 100644 --- a/drivers/net/wireless/marvell/mwifiex/main.c +++ b/drivers/net/wireless/marvell/mwifiex/main.c @@ -761,7 +761,8 @@ mwifiex_close(struct net_device *dev) "aborting scan on ndo_stop\n"); cfg80211_scan_done(priv->scan_request, &info); priv->scan_request = NULL; - priv->scan_aborting = true; + pr_alert("DEBUG: Not setting priv->scan_aborting = true;\n"); + // priv->scan_aborting = true; } if (priv->sched_scanning) { -- 2.22.1 ```

wifi is still working after suspend.

I'm still investigating

kitakar5525 commented 5 years ago

Refused to change power state, currently in D3 after module reloading

If I put wifi into D3hot on removing the module, I can reload the module and wifi is still working. (EDIT: 2019-08-18 Added debug print. change pr_alert to whatever you like)

mwifiex_pcie: set to d3hot on removing

```diff From 741fb3bcc81c000668e3c80e18986c77d5b807f2 Mon Sep 17 00:00:00 2001 From: kitakar5525 <34676735+kitakar5525@users.noreply.github.com> Date: Sun, 18 Aug 2019 19:26:41 +0900 Subject: [PATCH] mwifiex_pcie: set to d3hot on removing --- drivers/net/wireless/marvell/mwifiex/pcie.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/net/wireless/marvell/mwifiex/pcie.c b/drivers/net/wireless/marvell/mwifiex/pcie.c index 6a28d89ec..f34cbf242 100644 --- a/drivers/net/wireless/marvell/mwifiex/pcie.c +++ b/drivers/net/wireless/marvell/mwifiex/pcie.c @@ -314,6 +314,9 @@ static void mwifiex_pcie_remove(struct pci_dev *pdev) mwifiex_init_shutdown_fw(priv, MWIFIEX_FUNC_SHUTDOWN); } + pr_alert("DEBUG: putting into D3hot...\n"); + pci_set_power_state(pdev, PCI_D3hot); + mwifiex_remove_card(adapter); } -- 2.22.1 ```

...I'm not sure why this change works.

kitakar5525 commented 5 years ago

On 4.19, I feel we need to explicitly put wifi device into D0 state on resume from suspend.

mwifiex_pcie: put into D0 explicitly on resume

```diff From 9c9605de53468be901f7465c481829660d1143be Mon Sep 17 00:00:00 2001 From: kitakar5525 <34676735+kitakar5525@users.noreply.github.com> Date: Wed, 14 Aug 2019 09:28:58 +0900 Subject: [PATCH] mwifiex_pcie: put into D0 explicitly on resume --- drivers/net/wireless/marvell/mwifiex/pcie.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/drivers/net/wireless/marvell/mwifiex/pcie.c b/drivers/net/wireless/marvell/mwifiex/pcie.c index 2bf067138..6eee79d66 100644 --- a/drivers/net/wireless/marvell/mwifiex/pcie.c +++ b/drivers/net/wireless/marvell/mwifiex/pcie.c @@ -18,6 +18,7 @@ */ #include +#include #include "decl.h" #include "ioctl.h" @@ -200,6 +201,10 @@ static int mwifiex_pcie_resume(struct device *dev) card = pci_get_drvdata(pdev); + /* put into D0 */ + pr_alert("DEBUG: putting into D0...\n"); + pci_set_power_state(pdev, PCI_D0); + if (!card->adapter) { dev_err(dev, "adapter structure is not valid\n"); return 0; -- 2.22.0 ```

ywatanabe1989 commented 5 years ago

I'm also struggling with this problem; when I start up my surface pro4 machine, it works completely fine, but wi-fi gets missed after suspending (or hybernating??? by closing the lid, pushing power button, or pressing Super+L key).

The suggestions above (@tmarkov's sleep script, @ezyang's manual modprobe, @nkkollaw's 4 lines script) didn't work for my environment.

Surface Pro4 Centos 7 kernel 5.1.21 (downloaded from https://www.kernel.org/) used patches v5.1 (https://github.com/jakeday/linux-surface/tree/master/patches/5.1)

How can I fix this? Just hooking reloading wifi modules would satisfy me.

ywatanabe1989 commented 5 years ago

Also, I tried the following procedure, although the result was the same problem...

## Download kernel source
# https://www.kernel.org/
cd ~
mkdir kernel-sources
tar xf ~/Downloads/linux-5.2.9.tar.xz # 1
git clone https://github.com/jakeday/linux-surface # 2
git clone https://github.com/kitakar5525/linux-surface-patches # 3

cd linux-surface
sudo sh setup.sh 

reboot
cd ~/kernel-sources/linux-5.2.9
emacs Makefile # EXTRAVERSION = -sp4

make oldconfig
for i in ../linux-surface-patches/patch-5.2/0000-jakeday/*.patch; do patch -p1 < $i; done
for i in ../linux-surface-patches/patch-5.2/5525-wifi/*.patch; do patch -p1 < $i; done
for i in ../linux-surface-patches/patch-5.2/5529-debug/*.patch; do patch -p1 < $i; done
for i in ../linux-surface-patches/patch-5.2/5529-mwifiex-debug/*.patch; do patch -p1 < $i; done

cp ../linux-surface/configs/5.1/config .config
make -j4 bzImage
make -j4 modules
sudo make -j4 modules_install
sudo make -j4 install
kitakar5525 commented 5 years ago

@ywatanabe1989

First, thank you for testing on SP4! It seems that you applied all the three patches I described above.

I'd like to see dmesg -x log between suspend.

EDIT: The whole log is preferable but if you want, from PM: suspend entry (s2idle) to last mwifiex_pcie output is sufficient.

ywatanabe1989 commented 5 years ago

@kitakar5525 No, thank YOU! I tried your scripts though I knew your scripts targeted other surface products and I was not sure the way to apply it was appropriate. :o

This is a whole log from boot to after recovering from suspension. dmesg.txt (deleted because this included evbug)

Thank you in advance. I am not familiar with even logging systems, to say nothing of kernel's functioning.

kitakar5525 commented 5 years ago

Oh, I recommend disabling evbug (https://github.com/jakeday/linux-surface/issues/350#issuecomment-453874391) and post a log again. It's including a lot of (usually unnecessary) input devices log.

And it seems that your device actually entered hibernate. I guess you replaced suspend with hibernate on jakeday's setup.sh. If you want to suspend, undo the change.

Also, I'll test if the patches work also on hibernate on my SB1.

ywatanabe1989 commented 5 years ago

Thank you very much!! 1) I disabled evbug. 2) I switched from hibernation to suspension. First, I retried jakeday's setup.sh and answered "yes" when asked to use suspend. Second, just to be sure, I executed the following link commands on the page you recommended.

sudo ln -sfb /lib/systemd/system/hibernate.target /etc/systemd/system/suspend.target && sudo ln -sfb /lib/systemd/system/systemd-hibernate.service /etc/systemd/system/systemd-suspend.service

This is the log (dmesg_2.txt)). It seems that hibernation is used yet. I'm going to reinstall a kernel.

kitakar5525 commented 5 years ago

The command to undo is rather this one!

rm /etc/systemd/system/suspend.target && /etc/systemd/system/systemd-suspend.service

The log still shows that your device entered hibernate (PM: hibernation entry)

(I recommend to remove the first log which contains evbug just in case!)


Unfortunately, the patches did not work for hibernate also on my SB1.

ywatanabe1989 commented 5 years ago

dmesg_3.txt This may be the one! Thanks to you, I finally changed to suspend instead of hibernating.

kitakar5525 commented 5 years ago

Then, I think you still reload modules in /lib/systemd/system-sleep/sleep if you do nothing after running setup.sh.

Try commenting out all the lines and I'd like to know if the wifi is still working after suspend on SP4.

ywatanabe1989 commented 5 years ago

After commenting out all the lines in /lib/systemd/system-sleep/sleep, the wifi module got to survivie the suspend!! However, the wi-fi password was required again, and sadly even the accurate password (although pressed carefully several times) didn't satisfy the requirement.

dmesg_4.txt 'WINAS-H33-PC' is my SSID. It seems that once after passing the authentification process, the connection got lost because of "reason code 15".

kern  :info  : [   27.542157] mwifiex_pcie 0000:02:00.0: info: trying to associate to 'WINAS-H33-PC' bssid 34:76:c5:80:65:c6
kern  :info  : [   27.569764] mwifiex_pcie 0000:02:00.0: info: associated to bssid 34:76:c5:80:65:c6 successfully
kern  :info  : [   34.218078] mwifiex_pcie 0000:02:00.0: info: successfully disconnected from 34:76:c5:80:65:c6: reason code 15

wi-fi_pw_required_after_recovering

kitakar5525 commented 5 years ago

disconnected by reason code 15

The reason code 15 means 4way handshake timeout [include/linux/ieee80211.h] :

/* Reason codes */
    enum ieee80211_reasoncode {
[...]
        WLAN_REASON_4WAY_HANDSHAKE_TIMEOUT = 15,
[...]
    };

This issue is being tracked at Surface Pro 6 not connecting to WiFi access points with authentication after suspend, but works with open AP. · Issue #348 · jakeday/linux-surface


FW in reset state before entering suspend (SP4, Surface 3)

Also, your log shows that FW in reset state and the wifi card is removed before entering suspend.

kern  :info  : [   17.292488] mwifiex_pcie 0000:02:00.0: info: successfully disconnected from 34:76:c5:80:65:c6: reason code 3
kern  :info  : [   17.320456] mwifiex: DEBUG: Not setting priv->scan_aborting = true;
[...]
kern  :info  : [   17.625530] mwifiex_pcie 0000:02:00.0: DNLD_CMD: FW in reset state, ignore cmd 0x107
kern  :info  : [   17.625574] mwifiex_pcie: DEBUG: putting into D3hot...
kern  :info  : [   17.643967] mwifiex_pcie 0000:02:00.0: info: shutdown mwifiex...
kern  :info  : [   17.644135] mwifiex_pcie 0000:02:00.0: PREP_CMD: card is removed
[...]
kern  :info  : [   18.851903] PM: suspend entry (s2idle)

(Shown with DEBUG: is what my patches described above are doing.)

Wifi card remove will not happen on my SB1 but my Surface 3 (non-pro) indeed has this problem (authentication failure will not happen, though). I should look into this issue closer now…


Thank you for testing! Until we can fix this, I think reloading modules in system-sleep/sleep script works at least on suspend.

Lastly, the old dmesg log link is still alive in edit history! Click edited, click old edit log, click Options, then click Delete revision from history to completely remove the link. Just in case the log has too much info.

ywatanabe1989 commented 5 years ago

Thank you for your dedicated, detailed, and kind help!!

Thanks to you, I found my mistake. I have put scripts named hibernate1.sh and hibernate2.sh under /lib/systemd/system-sleep/.

I'll try to examine them.

hibernate1.sh

#!/bin/sh
case $1/$2 in
  pre/*)
    systemctl stop NetworkManager
    rmmod intel_ipts
    rmmod mei_me
    rmmod mei
    rmmod mwifiex_pcie
    rmmod mwifiex
    echo 1 > /sys/bus/pci/devices/0000\:02\:00.0/reset
    ;;
  post/*)
    modprobe mei
    modprobe mei_me
    echo 1 > /sys/bus/pci/devices/0000\:02\:00.0/reset
    modprobe mwifiex
    modprobe mwifiex_pcie
    systemctl start NetworkManager
    ;;
esac

hibernate2.sh

#!/bin/bash
case $1 in
  pre)
    # unload the modules before going to sleep
    modprobe -r intel_ipts
    modprobe -r mei_me
    modprobe -r mei
    modprobe -r mwifiex_pcie;
    modprobe -r mwifiex;
    modprobe -r cfg80211;
    ;;
  post)
    while [ $(cat /proc/acpi/button/lid/LID0/state | grep closed | wc -l) = 1 ]
    do
      echo $(date) >> /var/log/resuspend
      echo freeze > /sys/power/state
    done
    modprobe -i intel_ipts
    modprobe -i mei_me
    modprobe -i cfg80211;
    modprobe -i mwifiex;
    modprobe -i mwifiex_pcie;
    ;;
esac
kitakar5525 commented 5 years ago

@ywatanabe1989

undo replacing suspend with hibernate

I noticed that the command to undo replacing suspend with hibernate in wiki has typo. Maybe the second link /etc/systemd/system/systemd-suspend.service still remains there If you run it as it is. Because your device now enters suspend, I guess you already deleted the two files, though.

The command is rather this one:

# Added `sudo` and removed `&&`
sudo rm /etc/systemd/system/suspend.target /etc/systemd/system/systemd-suspend.service

I'll update the wiki.

system-sleep/ scripts

I have put scripts named hibernate1.sh and hibernate2.sh under /lib/systemd/system-sleep/.

You mean you have the two script files at the same time? then, both the two files will be executed. You need only one file. Also, if you use patches for Linux 5.2, you can refer to the latest system-sleep/sleep script here

kitakar5525 commented 5 years ago

I might find the cause of wifi dropping after suspend (see https://github.com/jakeday/linux-surface/issues/554#issuecomment-525262259)

Resolves

I confirmed this change does not break S0ix on Surface 3 (non-pro).

But as a side effect, it warns that the parent bridge is in D3hot on SB1. I'm not sure why.

kern  :warn  : [  112.777812] acpi device:4b: Cannot transition to power state D0 for parent in D3hot
mwifiex_pcie-disable-parent-bridge_d3.patch

```diff From 2350261cc0a24addd938ec6b665768ea1db05d42 Mon Sep 17 00:00:00 2001 From: kitakar5525 <34676735+kitakar5525@users.noreply.github.com> Date: Mon, 27 Aug 2019 19:04:56 +0900 Subject: [PATCH] mwifiex_pcie: disable parent bridge_d3 --- drivers/net/wireless/marvell/mwifiex/pcie.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/drivers/net/wireless/marvell/mwifiex/pcie.c b/drivers/net/wireless/marvell/mwifiex/pcie.c index 6e734a83e..a94fd0e8e 100644 --- a/drivers/net/wireless/marvell/mwifiex/pcie.c +++ b/drivers/net/wireless/marvell/mwifiex/pcie.c @@ -233,8 +233,12 @@ static int mwifiex_pcie_probe(struct pci_dev *pdev,                    const struct pci_device_id *ent) {    struct pcie_service_card *card; +   struct pci_dev *parent_pdev = pci_upstream_bridge(pdev);    int ret; +   pr_alert("DEBUG: disabling bridge_d3...\n"); +   parent_pdev->bridge_d3 = false; +    pr_debug("info: vendor=0x%4.04X device=0x%4.04X rev=%d\n",         pdev->vendor, pdev->device, pdev->revision); -- 2.23.0 ```

Not sure wifi is still working when FW in reset state before entering suspend happened on SP4.

StollD commented 5 years ago

I can confirm that with your patch applied, the WiFi seems to reliably survive suspending the surface, even without the systemd-sleep script.

However, it seems to break S0ix for me on my SB2. I only applied your last patch, so if I should try any other patches, please let me know.

kitakar5525 commented 5 years ago

it seems to break S0ix for me on my SB2.

I see… On SB1, only the last patch mwifiex_pcie-disable-parent-bridge_d3.patch is enough.

Currently, I have no idea. Maybe the platform-wide fix is difficult then.

xeiraex commented 5 years ago

Confirmation that pci_set_power_state(pdev, PCI_D3hot); works on Surface Pro 2017. I haven't lost WiFi after suspending since applying it. Thanks @kitakar5525

kitakar5525 commented 5 years ago

Hi @xeiraex, have you tried the patch mwifiex_pcie-disable-parent-bridge_d3.patch (https://github.com/jakeday/linux-surface/issues/420#issuecomment-525276319)?

kitakar5525 commented 5 years ago

We can check if disabling wifi parent bridge D3 state will fix this issue (without any patch) by adding kernel parameter pcie_port_pm=off to our bootloader.

@StollD Can you test with the parameter added to see if it will break S0ix states or not?


The parameter will disable all bridge's D3 state. I hope adding that parameter has no impact on suspend power consumption on all surface devices except SB series.

SB series has bridge for dGPU (I think even on non-dGPU model). On Windows, I found Windows will put the bridge and dGPU itself into D3hot state on suspend (at least on SB1, see SoC Watch output https://github.com/jakeday/linux-surface/issues/554#issuecomment-525259174). So, maybe disabling all bridge's D3 state will cause more power consumption on suspend.

StollD commented 5 years ago

@StollD Can you test with the parameter added to see if it will break S0ix states or not?

Added the parameter to the kernel commandline and same result: If I don't keep the sleep script, S0ix fails. It works only when the wifi driver gets unloaded before suspending.

kitakar5525 commented 5 years ago

It works only when the wifi driver gets unloaded before suspending.

OK. Thank you.

kitakar5525 commented 5 years ago

@jonas2515 @StollD and everyone

Can anyone please test with all of these patches to see if wifi is still working after several suspend while not breking S0ix states?

0001-mwifiex_pcie-disable-parent-bridge_d3.patch.txt 0002-mwifiex_pcie-put-into-D3hot-on-suspend.patch.txt 0003-mwifiex_pcie-put-into-D0-on-resume.patch.txt 0004-mwifiex-do-not-set-scan_aborting.patch.txt (I posted all of these patches before)

This time, not only disabled the parent bridge_d3 but also put the wifi device into D3hot before suspend. As a side effect, it seems that we need to avoid setting scan_aborting flag to see APs after suspend.

StollD commented 5 years ago

Same result as before: If I disable the sleep script, the wifi still survives the suspend, but the surface doesn't enter S0ix anymore. I can only get S0ix with the sleep script unloading the modules.

kitakar5525 commented 5 years ago

Thank you for testing!

I want to confirm that, under the following conditions:

your SB2 does not enter S0ix in this case, too, right? If your SB2 does not enter S0ix in this case, too, the cause of S0ix failure might be other than disabling bridge_d3 (?)

I'm not yet sure the cause of S0ix failure is my patches or not.

StollD commented 5 years ago

Yeah, sorry, your patches were never the reason for S0ix failing. Should have clarified that before. I get the same behaviour without these patches and the disabled sleep script.

kitakar5525 commented 5 years ago

OK, thank you!

Then, It means that the most difficult part (S0ix) still remains…

What will happen when you skip the first patch (disabling bridge_d3)? Namely, please try with these three patches this time


To summarize, on my SB1, at least for wifi to be working after several suspend:

or

are needed.

kitakar5525 commented 5 years ago

Hmm… the combination

seems to be not stable. It causes Firmware wakeup failed.

kitakar5525 commented 5 years ago

and the combination

caused cmd_timeout then RIP: 0010:dev_watchdog+0x26a/0x280 after several suspend and some usage.

So, after all, I feel that disabling bridge_d3 is the most stable way just against dropping the wifi after suspend (I don't know what we can do about S0ix yet)

EDIT: With only 0001-mwifiex_pcie-disable-parent-bridge_d3.patch.txt, I remember it worked before. Now it also fails to show APs after suspend (it now requires 0004-mwifiex-do-not-set-scan_aborting.patch.txt). Bluetooth might be related… (?)

sebanc commented 5 years ago

@StollD It would be interesting to know which module is preventing S0ix, could you please test if S0ix can be achieved by removing only the mwifiex_pcie module ?

StollD commented 5 years ago

@StollD It would be interesting to know which module is preventing S0ix, could you please test if S0ix can be achieved by removing only the mwifiex_pcie module ?

No, I have to unload both, mwifiex and mwifiex_pcie to get S0ix to work

sebanc commented 5 years ago

@StollD Thank you for your quick answer.

Would you mind trying the below patch when you have some time ? (without any other wifi patch and sleep script) https://gist.github.com/sebanc/ad488e220c97c369be0d5b627b7ebfb9#file-mwifiex_s0ix_test-patch

Note: you will have 2 warnings when compiling as I left the original suspend/resume functions which are now unused.

StollD commented 5 years ago

@sebanc That seems to work fine! Wifi doesn't crash after suspend, and the surface reaches S0ix, without having to unload the modules.

I will do a longer test and report back later, because I think sometimes the wifi doesn't come back properly if the surface was suspended for a longer time, even with the sleep script reloading it.

StollD commented 5 years ago

@sebanc Used it for a full day, with multiple 30 min suspends, one 2h suspend and one suspend overnight, same result: Stable Wifi + S0ix

sebanc commented 5 years ago

@StollD Thank you for testing it, I was not able to verify it on my SP4 as it does not go to S0ix. I only noticed that the battery consumption during suspend decreased a lot after this patch.

kitakar5525 commented 5 years ago

I also tried the patch without any other wifi patch and sleep script

on SB1, I confirmed that

I also tried on Surface 3 (non-pro, Intel Atom):

Nice work!

What I noticed:

EDIT: The device:4b is something related to \_SB_.PCI0.RP12.PXSX (wifi device on SB1)

$ cat /sys/bus/acpi/devices/device:4b/path
\_SB_.PCI0.RP12.PXSX
sebanc commented 5 years ago

@kitakar5525 It seems the suspend battery consumption improvement only applies to SP4 (m3 model only ?) which had an additional bug. Like your SB1, it goes to PC10 and not in S0ix. However, without this patch it is loosing 3.5% battery per hour in suspend (tested multiple times), now it is about the same as yours (1.2%).

a lot of messages on dmesg acpi device:4b: Cannot transition to power state D0 for parent in D3hot (caused by disabling bridge_d3).

I am not sure if it is really an issue as I also have it on the NVME pcie root port since the 5.3 kernel patch has been applied.

when I unload mwifiex modules before suspend, it cannot go to even PC10

My device seems to go to PC10 even if mwifiex modules are unloaded. In theory, if the mwifiex modules are unloaded before suspend, the modified suspend/resume functions should not be called. The issue might be related to the "sta_cmd.c" file patch (old patch I used but that might not be necessary anymore), could you try again after reverting it ?

kitakar5525 commented 5 years ago

acpi device:4b: Cannot transition to power state D0 for parent in D3hot

I am not sure if it is really an issue as I also have it on the NVME pcie root port since the 5.3 kernel patch has been applied.

mwifiex_pcie_suspend2 and mwifiex_pcie_resume2

It seems that the function mwifiex_pcie_suspend2 is from mwifiex_pcie_remove and the function mwifiex_pcie_resume2 is largely from mwifiex_pcie_probe. So, I think that it behaves in the same way as we unload modules before suspend and re-load modules after suspend.

> when I unload mwifiex modules before suspend, it cannot go to even PC10

The issue might be related to the "sta_cmd.c" file patch

sebanc commented 5 years ago

Thank you that's very clear.

Regarding the warning Cannot transition to power state D0 for parent in D3hot, it only appears once for me (as for nvme). I will try to understand what it's related to, else we can just remove it.

qzed commented 4 years ago

@sebanc Sorry, haven't been following this issue. Looking over WiFi issues right now. If I understand you all correctly, the patch fixes the resume issues? If so, do you want to send me a PR to https://github.com/qzed/linux-surface-kernel? (If the original wifi patches are not needed, just revert them, e.g. git revert 367d6a9191b6e24ddf8e6432933ebf7f22127e97on 5.3-surface-devel).

tmarkov commented 4 years ago

@qzed

Considering this,

mwifiex_pcie_suspend2 and mwifiex_pcie_resume2

It seems that the function mwifiex_pcie_suspend2 is from mwifiex_pcie_remove and the function mwifiex_pcie_resume2 is largely from mwifiex_pcie_probe. So, I think that it behaves in the same way as we unload modules before suspend and re-load modules after suspend.

Does it really make sense to put that patch in the kernel rather than use the sleep script?

qzed commented 4 years ago

@tmarkov Right, sorry. I missed that. As long as the sleep script works let's keep it at that then.

kitakar5525 commented 4 years ago

Achieving S0ix states with ASPM L1 state disabled for wifi device on Surface 3

On my Surface 3 (non-pro), I disable ASPM L1 state (L0s is still enabled) for daily usage (jakeday/linux-surface#456).

Disabling L1 state will break S0ix states on suspend. If I understand correctly, the patch will do almost the same thing as unloading/loading mwifiex_pcie module. What I'm now confused about is, when disabling the L1 state, the only way to achieve S0ix is to apply the patch. Unloading mwifiex modules will not fix S0ix🤔

So, I personally use the patch.

multiple output of acpi device:4b: Cannot transition to power state D0 for parent in D3hot on SB1

I mentioned before that the message will appear multiple times during suspend and it's annoying. I still cannot find what is happening. I want to suppress the message but I'm fine with that for now.