Closed northbound1 closed 8 years ago
I did a bit more research and deleted the post because further research indicated it wouldn't make a difference, the tasklet wouldn't get called again while it was running. The only way I could see that it would make a difference is if the lock is global across all loaded copies of the driver in which case the driver on the 2.4 ghz radio could be calling it at the same time as the 5 ghz radio, but I don't know. My experience with kernel modules is too weak. Once I get a compile system going I'll do some actual testing and debugging and if there's anything to that line of reasoning, I'll post my results. Until then I'm just taking a stab in the dark and probably showing my ignorance and wasting peoples time.
@cfinch81 .. no worries about the time wasting stuff. Most of us on here are good people (typical nerds without social skill sometimes but good people nonetheless). @yuhhaurlin is the engineer who's keeping it all together and releasing "official" code.
I do think there's something to the locks - personally I've been buried with work to check into it. About a week ago I was mentioned noticing that in mwl_tx_done and other functions the lock/unlocks are very far apart (code-wise with lots of commands happening in between, loops, etc.), and judging by the high soft IRQ count, (Rescheduling and Single Func), could it be possible that there is too much processing happening in between the locks so that the kernel defers calls (hence creating the Rescheduling interrupts)? I have on my to-do list next week (hopefully) to see what happens if the lock/unlock are tightened to just around the commands that actually need the lock.
About the types of locks, I found this useful ... https://www.kernel.org/pub/linux/kernel/people/rusty/kernel-locking/c214.html. BH locks seem to be appropriate here - note that in tx they're not uniform throughout the code.
Top uses timer tick to collect information. It means it collects information on the rate 100 times per second. For regular periodical event, top may not report correct information. The mechanism is correct and it won't take lot of CPU times when system is idle.
There is no need to use spin_lock with bh if the code is only used inside bh,
I think I can modify the code to trigger queue empty tasklet only when previous queue empty interrupt is occurred longer than one tick.
Looking forward to the commit.
From: yuhhaurlin [mailto:notifications@github.com] Sent: Thursday, November 12, 2015 4:28 AM To: kaloz/mwlwifi Cc: gufus
I think I can modify the code to trigger queue empty tasklet only when previous queue empty interrupt is occurred longer than one tick.
Thanks for everything. — Reply to this email directly or view it on GitHub.
@yuhhaurlin any news on the modification discussed here?
I wanted to share a patch to 10.3.0.14 for rx.c below.
On my test unit, the Rescheduling Interrupts are lower (~2-3x as the crow flies) under heavy transfer load and overall (upload) performance is increased. Stability is good with two PC running Intel AC 7260, a Samsung Galaxy S4, and an iPhone 6+.
My changes are to tighten the spin lock/unlock around the memory operations.
--- original_10.3.0.14/rx.c 2015-11-06 00:13:02.000000000 -0600
+++ updated_10.3.0.14/rx.c 2015-11-23 01:16:44.232709000 -0600
@@ -488,9 +488,10 @@
struct ieee80211_hdr *wh;
u32 status_mask;
- spin_lock(&priv->rx_desc_lock);
- desc = &priv->desc_data[0];
- curr_hndl = desc->pnext_rx_hndl;
+ spin_lock(&priv->rx_desc_lock);
+ desc = &priv->desc_data[0];
+ curr_hndl = desc->pnext_rx_hndl;
+ spin_unlock(&priv->rx_desc_lock);
if (!curr_hndl) {
status_mask = readl(priv->iobase1 +
@@ -500,7 +501,6 @@
priv->is_rx_schedule = false;
wiphy_warn(hw->wiphy, "busy or no receiving packets\n");
- spin_unlock(&priv->rx_desc_lock);
return;
}
@@ -560,7 +560,9 @@
tr = (struct mwl_dma_data *)
prx_skb->data;
+ spin_lock(&priv->rx_desc_lock);
memset((void *)&tr->data, 0, 4);
+ spin_unlock(&priv->rx_desc_lock);
pkt_len += 4;
}
@@ -602,9 +604,11 @@
&status))
goto out;
}
-
+ spin_lock(&priv->rx_desc_lock);
memcpy(IEEE80211_SKB_RXCB(prx_skb), &status, sizeof(status));
ieee80211_rx(hw, prx_skb);
+ spin_unlock(&priv->rx_desc_lock);
+
out:
mwl_rx_refill(priv, curr_hndl);
curr_hndl->pdesc->rx_control = EAGLE_RXD_CTRL_DRIVER_OWN;
@@ -622,5 +626,4 @@
priv->is_rx_schedule = false;
- spin_unlock(&priv->rx_desc_lock);
}
@Tiremeat: interesting. i'll see if i can add it to my code and test it as well. Thanks for your work!
@johnnysl mind walking me through on applying this on my build? I want to know how to apply this patch. Thank You
@thagabe: dead easy, from build root create "package/kernel/mwlwifi/patches/120-tiremeat.patch" (or whatever fancy name you want to give it) copy and paste above content in the file, and then build the image as usual.
(assuming you are building an OpenWrt image)
@johnnysl durrrrrr! of course! thanks!
@notnyt Commit code to decrease the number to trigger queue empty tasklet.
@yuhhaurlin Have you committed @johnnysl 's patches?
@yuhhaurlin looks much better now at idle.
CPU0: 0.0% usr 1.6% sys 0.0% nic 96.7% idle 0.0% io 0.0% irq 1.6% sirq CPU1: 0.0% usr 0.0% sys 0.0% nic 100% idle 0.0% io 0.0% irq 0.0% sirq
And while transferring a file over 5ghz at 21.5MB/s
CPU0: 0.1% usr 0.0% sys 0.0% nic 30.0% idle 0.0% io 0.0% irq 69.7% sirq CPU1: 0.0% usr 0.0% sys 0.0% nic 55.9% idle 0.0% io 0.0% irq 44.0% sirq
sirq is significantly improved with @TireMeat 's patch applied to the new version. Good job. @yuhhaurlin please evaluate.
@TireMeat you should submit a pull request to make it easier. If you fork the repo, clone it locally, make your changes, commit them, then push it back to github, you can submit a pull request to this repo to make it easy.
During the same xfer as above:
CPU0: 0.0% usr 0.0% sys 0.0% nic 24.5% idle 0.0% io 0.0% irq 75.4% sirq CPU1: 0.3% usr 0.3% sys 0.0% nic 98.6% idle 0.0% io 0.0% irq 0.5% sirq
@notnyt I'll assume the patch was not merged :/ rats! I just finished building a new image with your patch and the new driver.
@thagabe i have the changes commited to my forked repo here: https://github.com/notnyt/mwlwifi
I've submitted a pull request with it, but you can check that out if you want to.
@notnyt Please help to check if new committed code won't get high Software IRQ for v1 when system is idle?
@yuhhaurlin please see my replies further up.
The new changes fix the SIRQ issue. @TireMeat 's patch reduces SIRQ usage under load even further.
Summarized:
With just the code in this repo at idle:
CPU0: 0.0% usr 1.6% sys 0.0% nic 96.7% idle 0.0% io 0.0% irq 1.6% sirq
CPU1: 0.0% usr 0.0% sys 0.0% nic 100% idle 0.0% io 0.0% irq 0.0% sirq
And while transferring a file over 5ghz at 21.5MB/s
CPU0: 0.1% usr 0.0% sys 0.0% nic 30.0% idle 0.0% io 0.0% irq 69.7% sirq
CPU1: 0.0% usr 0.0% sys 0.0% nic 55.9% idle 0.0% io 0.0% irq 44.0% sirq
And while transferring a file over 5ghz with @tiremeat's patch. Notice the 2nd core isn't loaded.
CPU0: 0.0% usr 0.0% sys 0.0% nic 24.5% idle 0.0% io 0.0% irq 75.4% sirq
CPU1: 0.3% usr 0.3% sys 0.0% nic 98.6% idle 0.0% io 0.0% irq 0.5% sirq
Temperatures have also dropped:
root@ZOMGWTFBBQWIFI:~# sensors
armada_thermal-virtual-0
Adapter: Virtual device
temp1: +64.6°C
tmp421-i2c-0-4c
Adapter: mv64xxx_i2c adapter
temp1: +52.8°C
temp2: +53.1°C
@notnyt I think if the new commit fixed this problem, you can close this one. rx_desc spin lock is used to protect receive descriptor structure. Thanks for your help.
@yuhhaurlin I'm pretty sure everything remains protected with his patch. Please review:
https://github.com/kaloz/mwlwifi/issues/43#issuecomment-158871088
It lowers sirq usage a good deal and I have not seen any stability issues.
@yuhhaurlin Isn't the rx_desc spin lock still used where necessary with this patch? I'm running with it now with no adverse affects. Does the entire method need the spinlock as it was?
@notnyt I check the code now. It looks like there is no need to use this spin lock.
@yuhhaurlin What do you mean? All he's done is tightened up the existing lock around the sensitive structures/calls. Functionality should remain the same, unless I'm missing a bit of code there that needs to be protected. If so, can you point out where? Are you saying that function did not need the spinlock at all?
Only this tasketlet access rx descriptor, so the spin lock maybe can be removed. I check it now.
Latest changes look great on v1. Anyone test on v2? Safe to close this out now?
root@ZOMGWTFBBQWIFI:~# sensors
armada_thermal-virtual-0
Adapter: Virtual device
temp1: +61.7°C
tmp421-i2c-0-4c
Adapter: mv64xxx_i2c adapter
temp1: +50.5°C
temp2: +50.9°C
CPU0: 0.0% usr 0.0% sys 0.0% nic 100% idle 0.0% io 0.0% irq 0.0% sirq
CPU1: 16.6% usr 0.0% sys 0.0% nic 83.3% idle 0.0% io 0.0% irq 0.0% sirq
CPU0: 0.0% usr 0.0% sys 0.0% nic 88.4% idle 0.0% io 0.0% irq 11.5% sirq
CPU1: 0.0% usr 1.4% sys 0.0% nic 98.5% idle 0.0% io 0.0% irq 0.0% sirq
Will test the latest changes tomorrow on my V2. Must say that tiremeats patch did not change anything on my V2.
1900ac v1
Commit code to support following items:
Using username "root". DD-WRT v3.0-r28320 std (c) 2015 NewMedia-NET GmbH Release: 11/25/15 Authenticating with public key "rsa-key-20120810"
BusyBox v1.24.1 (2015-11-25 03:38:09 CET) built-in shell (ash)
root@AC-DD-WRT_M:~#
2g wifi 3 clients connected
sirq are down, and they flush pretty fast too
idle
CPU0: 1.9% usr 0.3% sys 0.0% nic 90.0% idle 0.0% io 0.0% irq 7.5% sirq CPU1: 0.1% usr 0.0% sys 0.0% nic 97.2% idle 0.0% io 0.0% irq 2.5% sirq
temps are the same
23:32:18 CPUtemp=68 WLOtemp=52 WL1temp=53 fanstatus=0 23:32:19 CPUtemp=66 WLOtemp=52 WL1temp=53 fanstatus=0 23:32:20 CPUtemp=68 WLOtemp=52 WL1temp=53 fanstatus=0 23:32:22 CPUtemp=66 WLOtemp=52 WL1temp=53 fanstatus=0 23:32:23 CPUtemp=68 WLOtemp=52 WL1temp=53 fanstatus=0 23:32:24 CPUtemp=66 WLOtemp=52 WL1temp=53 fanstatus=100 23:32:25 CPUtemp=66 WLOtemp=52 WL1temp=53 fanstatus=100 23:32:26 CPUtemp=66 WLOtemp=52 WL1temp=53 fanstatus=100 23:32:27 CPUtemp=66 WLOtemp=52 WL1temp=53 fanstatus=100 23:32:28 CPUtemp=68 WLOtemp=52 WL1temp=53 fanstatus=100 23:32:29 CPUtemp=68 WLOtemp=52 WL1temp=53 fanstatus=0 23:32:30 CPUtemp=68 WLOtemp=52 WL1temp=53 fanstatus=0 23:32:31 CPUtemp=68 WLOtemp=52 WL1temp=53 fanstatus=0 23:32:32 CPUtemp=68 WLOtemp=52 WL1temp=53 fanstatus=0 23:32:33 CPUtemp=68 WLOtemp=52 WL1temp=53 fanstatus=0 23:32:34 CPUtemp=68 WLOtemp=52 WL1temp=53 fanstatus=100
wifi load
CPU0: 0.7% usr 2.3% sys 0.0% nic 51.9% idle 0.0% io 0.0% irq 44.8% sirq CPU1: 0.5% usr 0.9% sys 0.0% nic 74.9% idle 0.0% io 0.0% irq 23.5% sirq
So what do we call this new set of patches? 10.3.0.15?
I'll do a build now, but I'm on 18 days uptime with no issues on 10.3.0.14 on a V1 lol.
You can check kaloz to build this version of code. 10.3.0.15 is under development. I remember kaloz will hook date with version.
@kaloz There is definite improvement on the latest push to trunk. Don't have time now but will beat on it more later.
Soft Irq CPU-load on the V2 dropped from 1.5% to 0.1% after the upgrade. Impressive difference. Wifi speed has also significantly improved. This update is really a great improvement! CPU and wireless temperatures do not really differ after the upgrade though. 68, 41 and 42 degrees C.
I had checked with kaloz. Latest version of this driver is checked into CC branch and main trunk. Thanks for all helps from community. This issue is closed.
@yuhhaurlin .. thanks much Mr. Lin - you've carried this project. Kalotz hasn't done much as far as I've seen - too busy with conferences and panels. ;-)
@TireMeat This project is cooperated between Belkin and Marvell. This driver is the responsibility of Marvell. Kaloz is OpenWrt site admin.
From: yuhhaurlin [mailto:notifications@github.com] Sent: Thursday, November 26, 2015 8:23 PM To: kaloz/mwlwifi Cc: gufus
I had checked with kaloz. Latest version of this driver is checked into CC branch and main trunk. Thanks for all
Thanks for everything!
— Reply to this email directly or view it on GitHub.
I am running V1 hardware and with latest patches SoftIRQ is still an issue (~4176 p/s). CPU load at idle is ~26%. Temps are at least 5°C higher. Performance is the same or lower as with 10.3.0.3.
@yuhhaurlin, I stumbled across this thread when I was doing an experiment to see if OpenWRT with a tickless kernel (NO_HZ_IDLE) can reduce the #interrupts. Some interrupts did go down, but mwlwifi interrupts (i.e. queue empty) went up, probably because the interrupt isn't cleared until the next tick, leading to lots of spurious IRQs. Anyways, it seems you've already done a lot to reduce the CPU usage to 0 during idle, so for practical purposes, the current driver is good enough. But being an optimization freak, I'm having an itch to get rid of those thousands of spurious interrupts/s.
I have little experience writing kernel drivers, but from what I've read in the code, there are 3 places for flushing the aggregate packet. The 1st 2 happen synchronously with packet submission ((mwl_mac80211_tx()).
Instead of periodically flushing packets every jiffy, I propose creating a 1 shot timer interrupt each time an new aggregate packet is created (in mwl_mac80211_tx()). It will flush the aggregate packet in the future, but if #1 or #2 happens before that, then the timer interrupt will just be a no-op.
This should greatly reduce the #interrupts during idle and allow the CPU to enter sleep longer.
What do you think?
-Yale
This had been done before. I don't think I will change it now.
Thanks for the fast reply. If it was tried, then why was it changed? What problems did it have?
The more heavy the traffic is, the less interrupts generated by queue empty.
Since no one else is going to start this issue :) root@OpenWrt:~# grep mwl /proc/interrupts ; sleep 1; grep mwl /proc/interrupts 87: 1282535 0 armada_370_xp_irq 59 mwlwifi 88: 124531322 0 armada_370_xp_irq 60 mwlwifi 87: 1284531 0 armada_370_xp_irq 59 mwlwifi 88: 124533305 0 armada_370_xp_irq 60 mwlwifi htop also shows 37% core0 with no load 60%+ when both radios are enabled Could the checks be cut in half or 25%