Closed northbound1 closed 8 years ago
Agreed. Monitoring is showing 60% utilization, on core0, though the router is idle. If I turn off wifi (both of them), CPU returns to normal.
This doesn't seem to affect performance.
I see the same, openwrt trunk and 10.3.0.12
Thanks. I will use timer task to replace queue empty interrupt to flush AMSDU packets.
@yuhhaurlin Thanks for the work I can almost see the light at the end of the tunnel. :) And ALL of the others that help to bring it together!
@yuhhaurlin
Thanks for everything. :+1:
Initial tests show me that the new driver shows a way way higher Sirq-load during heavy Wifi usage?
Device: WRT1900AC-V2
Confirmed.
CPU0: 0.0% usr 0.0% sys 0.0% nic 11.3% idle 0.0% io 0.0% irq 88.6% sirq
CPU1: 0.1% usr 0.3% sys 0.0% nic 60.1% idle 0.0% io 0.0% irq 39.2% sirq
I think this might be a little tight.
What load were you seeing with the other driver while performing a transfer? I didn't test before hand.
at max: 20%, now over 80% for the same amount of wifi traffic
I think it is correct to use queue empty interrupt to flush AMSDU packets. Although it generates lot interrupts when system is idle, but it will not generate any interrupts when wifi is heavy. I will change it back and release 10.3.0.14.
@yuhhaurlin : the whole discussion comes from the fact that the driver seems to create excessive load on a V1 WRT1900AC, when idle. This blamed on the high amount of Interrupts created every second. Yet with the V2-version, there is no CPU-load when idle (but still a lot of interrupts seen in /proc/interrupts).
So ideally you want a driver that does not create any CPU-load when idle, and does not create an excessive load either when moving data via WiFI. Personally i have no clue if the huge interrupts are related to the high CPU load on the V1, or that there is another bug in the driver/kernel/whatever.
If wifi is idle, even though more interrupts are generated, it should not do lot of things in the flush function.
Please update the issue after you commit and I will test again right away. Thanks
1900ac v1
wifi driver 10.3.0.12
Both mwlwifi 2g and 5g on CPU1
CPU0: 0.1% usr 0.5% sys 0.0% nic 99.2% idle 0.0% io 0.0% irq 0.0% sirq CPU1: 0.9% usr 1.5% sys 0.0% nic 36.6% idle 0.0% io 0.0% irq 60.7% sirq
@yuhhaurlin ... just a thought -- in mwl_tx_done and other functions the lock/unlocks are very far apart (code-wise, lots of things happening), judging by the high soft IRQ count, (Rescheduling and Single Func), could it be possible that there is too much processing happening in between the locks so that the kernel defers calls (hence creating the Rescheduling interrupts)?
What would happen if we tighten the lock/unlock to just the commands needed in the lock?
.14 is worse under load than .13. I think you had it better with the timer.
Mem: 51352K used, 203708K free, 124K shrd, 4628K buff, 11736K cached CPU0: 0.0% usr 0.0% sys 0.0% nic 7.2% idle 0.0% io 0.0% irq 92.7% sirq CPU1: 0.0% usr 1.4% sys 0.0% nic 24.6% idle 0.0% io 0.0% irq 73.9% sirq
I actually tested these both under load, and they have similar sirq usage. However, .13 is MUCH better at idle. I'd suggest sticking with the timer as opposed to the queue empty interrupt.
Thoughts?
I think marvell will keep queue empty interrupt to flush AMSDU packets.
Please based on 10.3.0.14 to see if you still encounter issues listed here. I will check open issues later and try to close them if they will not happen on 10.3.0.14. Thanks.
I'll check .14 on my v2 later today. What i don't understand is why he behavior is different on the different hardware versions.
@yuhhaurlin 10.3.0.14 on a WRT1900AC-V2 lowers the cpu temperature by about 15 degrees, and no more high load during heavy wifi transfers. CPU load is ~0 when idling as well, so this looks good for me.
Thanks for your information.
At idle
root@ZOMGWTFBBQWIFI:/# dmesg | grep 10.3.0
[ 18.862598] <<Marvell 802.11ac Wireless Network Driver version 10.3.0.14>>
root@ZOMGWTFBBQWIFI:/# uptime
09:50:45 up 17 min, load average: 0.00, 0.01, 0.04
root@ZOMGWTFBBQWIFI:/# sensors
armada_thermal-virtual-0
Adapter: Virtual device
temp1: +68.2°C
tmp421-i2c-0-4c
Adapter: mv64xxx_i2c adapter
temp1: +54.3°C
temp2: +55.0°C
CPU0: 0.0% usr 3.1% sys 0.0% nic 39.0% idle 0.0% io 0.0% irq 57.8% sirq
CPU1: 0.0% usr 0.0% sys 0.0% nic 100% idle 0.0% io 0.0% irq 0.0% sirq
with 2.4ghz wan to wifi xfer and 5ghz lan to wifi. Is this within safe operating limits?
root@ZOMGWTFBBQWIFI:/# uptime
10:20:23 up 47 min, load average: 0.43, 0.38, 0.24
root@ZOMGWTFBBQWIFI:/# sensors
armada_thermal-virtual-0
Adapter: Virtual device
temp1: +87.7°C
tmp421-i2c-0-4c
Adapter: mv64xxx_i2c adapter
temp1: +66.9°C
temp2: +78.1°C
CPU0: 0.0% usr 0.0% sys 0.0% nic 7.8% idle 0.0% io 0.0% irq 92.1% sirq
CPU1: 0.8% usr 1.7% sys 0.0% nic 34.7% idle 0.0% io 0.0% irq 62.6% sirq
@notnyt I just have no idea why a V1 shows such a different behavior. Was this always the case? so also with 10.3.0.3, or was this introduced in some version? i'm getting the idea that the load issue you see might be because of some other bug.
this is my temp during a load test: /# sensors tmp421-i2c-0-4c Adapter: mv64xxx_i2c adapter temp1: +44.4°C temp2: +46.2°C
armada_thermal-virtual-0 Adapter: Virtual device temp1: +72.7°C
/# top (during idle) Mem: 103284K used, 411740K free, 1804K shrd, 38620K buff, 18584K cached CPU: 0% usr 0% sys 0% nic 98% idle 0% io 0% irq 1% sirq Load average: 0.03 0.04 0.05 2/82 29752
I wasn't monitoring this closely with 10.3.0.3, but I do not believe this was the case. I believe this began with 10.3.0.8, but I could not run it due to BA issues. Which kernel are you using?
What are you doing for a load test? My test was wan to wifi on 2.4ghz at 120mbps and lan to wifi on 5ghz at about 200mbps.
As for idle, it seems v1 always experience 50-60% sirq since then.
load-test: run iperf3 with a N-300 client on 5ghz, against an iperf3 server on the lan. i test traffic both up and down, and the speed around 160mbits.
/# uname -a Linux MYROUTR 3.18.23 #3 SMP Fri Nov 6 11:50:11 CET 2015 armv7l n
i'm running DD bleeding edge r47389
@yuhhaurlin any idea why v2 isn't showing this problem but v1 is?
I am just a peon but why when the hardware acceleration is enabled ..Since .8? The cpu load increases with sirq, I thought it would be off loaded. Or am I in left field?
Top uses timer tick to collect information. It means it collects information on the rate 100 times per second. For regular periodical event, top may not report correct information.
I think you only need to care top information when wifi traffic is heavy.
If not showing both cores top shows an avg. of both at least for me. Htop on the other hand shows both. To me it is the that heat tells the story 10.3.0.3 at idle was close to nil fan never ran.. 10.3.0.8 since has heated me up to 60c and cycled the fan @ 10 to 15 min cycles. This is at idle.
I guess what I am saying is why is it not the wifi getting hot since the load should be going there. Instead of the cpu.
Does anyone know how much overhead kernel debugging (debugfs?) add to overhead?
(BTW, 10.3.0.10 has the same system resource performance profile = ~4K+ int/sec and ~250 context switch/sec)
root@WRT1900ac:/# vmstat 1
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
r b swpd free buff cache si so bi bo in cs us sy id wa
0 0 0 163972 11296 24084 0 0 0 0 2075 103 1 30 69 0
0 0 0 163972 11296 24084 0 0 0 0 4245 258 0 27 73 0
0 0 0 164028 11296 24084 0 0 0 0 4261 225 1 29 70 0
0 0 0 163976 11296 24084 0 0 0 0 4185 252 0 24 76 0
0 0 0 164032 11296 24084 0 0 0 0 3103 241 0 31 69 0
0 0 0 163972 11296 24084 0 0 0 0 4235 207 0 28 72 0
This is what I see running guface's fan script you can see the fan cycles. This is at idle. https://onedrive.live.com/redir?resid=E71014BBAA19358A!8052&authkey=!ALdIv8UJZS26DW0&v=3&ithint=photo%2cJPG
Trying to blame top for anything is absurd. The busybox version in openwrt sleeps for 5 seconds at a time then reads the data from /proc. You can trigger an update by pressing enter. Pressing 1 shows individual cpu stats.
driver 10.3.0.13 shows no sirq at idle, yet 10.3.0.14 shows thousands per second. CPU temperature at idle is approximately 70C
root@ZOMGWTFBBQWIFI:~# grep mwlwifi /proc/interrupts ;sleep 1;grep mwlwifi /proc/interrupt
s
87: 164636645 0 armada_370_xp_irq 59 mwlwifi
88: 160197843 0 armada_370_xp_irq 60 mwlwifi
87: 164638651 0 armada_370_xp_irq 59 mwlwifi
88: 160199865 0 armada_370_xp_irq 60 mwlwifi
root@ZOMGWTFBBQWIFI:~# uptime
09:01:34 up 23:28, load average: 0.11, 0.07, 0.06
root@ZOMGWTFBBQWIFI:~# sensors
armada_thermal-virtual-0
Adapter: Virtual device
temp1: +70.4°C
tmp421-i2c-0-4c
Adapter: mv64xxx_i2c adapter
temp1: +55.2°C
temp2: +56.1°C
10.3.0.13 is an improvement I can now run both radios and stay cool. root@OpenWrt:~# grep mwlwifi /proc/interrupts ;sleep 1;grep mwlwifi /proc/interrupts 87: 29265 0 armada_370_xp_irq 59 mwlwifi 88: 1043990 0 armada_370_xp_irq 60 mwlwifi 87: 29318 0 armada_370_xp_irq 59 mwlwifi 88: 1044030 0 armada_370_xp_irq 60 mwlwifi root@OpenWrt:~# sensors tmp421-i2c-0-4c Adapter: mv64xxx_i2c adapter temp1: +45.1°C temp2: +48.1°C
armada_thermal-virtual-0 Adapter: Virtual device temp1: +53.7°C
@northbound1: well 10.0.3.13 does exactly the reverse on a V2, it's 10.3.0.14 that runs quite well on that platform. So i'm not convinced that the interrupts logged in /proc/interrupts are the cause of the high cpu. It might also be some other change in the driver.
.14 running on a V1 with low cpu temp and high idle sirq
10.3.0.14
root@Test:~# grep mwlwifi /proc/interrupts; sleep 1; grep mwlwifi /proc/interrupts 87: 32586446 0 armada_370_xp_irq 59 mwlwifi 88: 32560650 0 armada_370_xp_irq 60 mwlwifi 87: 32587931 0 armada_370_xp_irq 59 mwlwifi 88: 32562128 0 armada_370_xp_irq 60 mwlwifi
root@Test:~# uptime 22:25:59 up 4:51, load average: 0.00, 0.01, 0.04
Cpu: 55°C Ram: 42°C Wifi: 45°C
Using modified fan script based on gufus implementation.
Back to 10.3.0.13 .14 is tolerable on 1 radio but with both up the fan cycles. There is no real difference in iperf no difference in usb3 transfer. I can't see idling at 25% with no load. That makes no sense to me.
1900ac v1
wifi driver 10.3.0.12
Both mwlwifi 2g and 5g on CPU1
Runs like this MOST of the time.
CPU0: 0.7% usr 0.3% sys 0.0% nic 98.8% idle 0.0% io 0.0% irq 0.0% sirq CPU1: 0.5% usr 0.9% sys 0.0% nic 34.8% idle 0.0% io 0.0% irq 63.5% sirq
Then once-in-awhile...
CPU0: 0.1% usr 0.3% sys 0.0% nic 99.4% idle 0.0% io 0.0% irq 0.0% sirq CPU1: 0.7% usr 0.5% sys 0.0% nic 80.0% idle 0.0% io 0.0% irq 18.5% sirq
queue empty interrupt to flush AMSDU packets WORKS
I'm sure wifi driver 10.3.0.14 will be the SAME, it's the same code as wifi driver 10.3.0.12
@gufus have you tried .13?
From: northbound1 [mailto:notifications@github.com] Sent: Sunday, November 08, 2015 2:34 PM To: kaloz/mwlwifi Cc: gufus
@gufus have you tried .13?
Nope.
— Reply to this email directly or view it on GitHub.
You should since you also have a v1. Just to see if you see any real difference in iperf or anything else. I see no real difference. Except for the cpu doing what it should at idle which is next to nothing.
This is what you should see in collectd when idle with both radios up. wrt1900ac v1 10.3.0.13
1900ac v1
Using username "root". DD-WRT v3.0-r28112 std (c) 2015 NewMedia-NET GmbH Release: 11/10/15 Authenticating with public key "rsa-key-20120810"
BusyBox v1.24.1 (2015-11-10 00:30:38 CET) built-in shell (ash)
root@AC-DD-WRT:~# strings /lib/ath9k/mwlwifi.ko | grep 10.3 10.3.0.14 10.3.0.14 root@AC-DD-WRT:~#
2 clients on 2.4ghz 1 client on 5ghz
CPU0: 1.5% usr 0.9% sys 0.0% nic 56.5% idle 0.0% io 0.0% irq 40.8% sirq CPU1: 0.1% usr 0.5% sys 0.0% nic 83.2% idle 0.0% io 0.0% irq 15.9% sirq
CPU 67.4 °C / WL0 50.7 °C / WL1 52.5 °C
NO performance tweaks
1900ac v1
wifi driver 10.3.0.14
Both mwlwifi 2g and 5g on CPU1
queue empty interrupt to flush AMSDU packets WORKS
wifi driver 10.3.0.14 is the SAME
MOST of the time
CPU0: 0.9% usr 0.7% sys 0.0% nic 98.2% idle 0.0% io 0.0% irq 0.0% sirq CPU1: 0.1% usr 0.3% sys 0.0% nic 39.7% idle 0.0% io 0.0% irq 59.6% sirq
once-in-awhile
CPU0: 0.5% usr 0.3% sys 0.0% nic 99.0% idle 0.0% io 0.0% irq 0.0% sirq CPU1: 1.3% usr 1.3% sys 0.0% nic 78.6% idle 0.0% io 0.0% irq 18.5% sirq
@Calvin Finch If you get it to work on a v2 I am game to try it on a v1 I would be great to see something that works properly on both. Edit: Since your post is no longer here But I got the e-mail I will reply here.
Since no one else is going to start this issue :) root@OpenWrt:~# grep mwl /proc/interrupts ; sleep 1; grep mwl /proc/interrupts 87: 1282535 0 armada_370_xp_irq 59 mwlwifi 88: 124531322 0 armada_370_xp_irq 60 mwlwifi 87: 1284531 0 armada_370_xp_irq 59 mwlwifi 88: 124533305 0 armada_370_xp_irq 60 mwlwifi htop also shows 37% core0 with no load 60%+ when both radios are enabled Could the checks be cut in half or 25%