Closed avhm closed 8 months ago
Thanks for the detailed report...
Will have a good crack at what's going on this weekend... Your config looks sane enough...
Apologies for the verbosity, I'll document the whole testing process as it might help others in future
PHASE 1 - HIGH LEVEL SIMPLE TESTING of BUILD vs INI COMBINATIONS
>STEP 1) CHECK RESULTING VALUES ON THIS AND PREV RELEASE WITHOUT IRQBALANCE
+ LIST ALL INFLUENCING PARAMETERS
rpi-support.sh | grep -E '(^PERF|^POWERPROFILE)'
grep steering /etc/config/network
service | grep irqbalance
PERFTWEAKS="default"
POWERPROFILE="quick"
option packet_steering '0'
/etc/init.d/irqbalance stopped stopped
################################################################# CURRENT build: r19345-72b93b77a5
kernel: 5.10.109-1-98b40dd8f25c85d4d803bf3da185f1b4
base-files: 1473-r19345-72b93b77a5
busy: 1.35.0-3
################################################################# [root@rpi-dca6325631 /usbstick 55°]# rpi-perftweaks.sh
INTERACTIVE rpi-perftweaks.sh> GOVERNOR[ondemand]: quick upthresh:21 downfac:6 minfreq:1100000 maxfreq:1500000
INTERACTIVE rpi-perftweaks.sh> RENICE dnsmasq:-5 uhttpd:2 dropbear:1
INTERACTIVE rpi-perftweaks.sh> TASKSET: nlbwmon:14209>3 collectd:20860>3 uhttpd:14399>2 odhcpd:1139>3 dropbear:14310>3 dropbear:4690>3 dnsmasq:11742>3 dnsmasq:11310>3
INTERACTIVE rpi-perftweaks.sh> MMCINTs: c:c>33
INTERACTIVE rpi-perftweaks.sh> ETH0INT: 40:1=1 41:1=1
INTERACTIVE rpi-perftweaks.sh> LANIFACE:eth0 WANIFACE:eth775c13
INTERACTIVE rpi-perftweaks.sh> sysval: eth0/queues/tx-0/xps_cpus[1=1]
INTERACTIVE rpi-perftweaks.sh> sysval: eth0/queues/tx-1/xps_cpus[2=2]
INTERACTIVE rpi-perftweaks.sh> sysval: eth0/queues/tx-2/xps_cpus[4=4]
INTERACTIVE rpi-perftweaks.sh> sysval: eth0/queues/tx-3/xps_cpus[4=4]
INTERACTIVE rpi-perftweaks.sh> sysval: eth0/queues/tx-4/xps_cpus[2=2]
INTERACTIVE rpi-perftweaks.sh> sysval: eth0/queues/rx-0/rps_cpus[7=7]
INTERACTIVE rpi-perftweaks.sh> sysval: eth775c13/queues/rx-0/rps_cpus[7=7]
############################################# PREVIOUS BUILD (too lazy to reinstall get from plog)
### fgrep -r sysval /boot/plog/pdmesg-20220*
### cat /boot/plog/pdmesg-202203221440-5.7.9-2.log | grep -C7 sysval
##################################################################################################
[ 260.536404] rpi-perftweaks.sh[bg]> GOVERNOR[ondemand]: quick upthresh:21 downfac:6 minfreq:1100000 maxfreq:1500000
[ 260.646962] rpi-perftweaks.sh[bg]> RENICE dnsmasq:-5 uhttpd:2 dropbear:1
[ 260.796340] rpi-perftweaks.sh[bg]> TASKSET: nlbwmon:29326>3 collectd:14819>3 uhttpd:13470>2 odhcpd:32338>3 dropbear:4643>3 dnsmasq:5952>3 dnsmasq:5905>3
[ 260.856999] rpi-perftweaks.sh[bg]> MMCINTs: c:f>33
[ 260.900133] rpi-perftweaks.sh[bg]> ETH0INT: 40:f>1 41:f>1
[ 260.931421] rpi-perftweaks.sh[bg]> LANIFACE:eth0 WANIFACE:eth775c13
[ 260.958373] rpi-perftweaks.sh[bg]> sysval: eth0/queues/tx-0/xps_cpus[0>1]
[ 260.984731] rpi-perftweaks.sh[bg]> sysval: eth0/queues/tx-1/xps_cpus[0>2]
[ 261.040768] rpi-perftweaks.sh[bg]> sysval: eth0/queues/tx-2/xps_cpus[0>4]
[ 261.091793] rpi-perftweaks.sh[bg]> sysval: eth0/queues/tx-3/xps_cpus[0>4]
[ 261.145915] rpi-perftweaks.sh[bg]> sysval: eth0/queues/tx-4/xps_cpus[0>2]
[ 261.198487] rpi-perftweaks.sh[bg]> sysval: eth0/queues/rx-0/rps_cpus[0>7]
[ 261.251212] rpi-perftweaks.sh[bg]> sysval: eth775c13/queues/rx-0/rps_cpus[0>7]
##################################################### quick assessment
No obvious change to script operation (with irqbalance off)
> ETH0 IRQ's MOVED to 1 from f ( with irqbalance the script should not touch these? or irqbalance can change anyway)
> STEERING QUEUES all share hardcoded values and are applied
> SCALING as expected and equivalent
>STEP 2) CHECK RESULTING VALUES ON THIS AND PREV RELEASE >>>WITH<<< (just)IRQBALANCE
### NEWBUILD $> /etc/init.d/irqbalance enable; reboot; dmesg | grep perft
dca632 /usbstick 58°# ps www | grep irqbalance
6332 root 1172 S /usr/sbin/irqbalance -f -t 10
10514 root 1252 SN grep irqbalance
dca632 /usbstick 56°# service | grep irq
/etc/init.d/irqbalance enabled running
dca632 /usbstick 58°# dmesg | grep perft
[ 18.194063] rc.custom> raspberrypi,4-model-b perftweaks /bin/rpi-perftweaks.sh [run]
[ 99.800939] rpi-perftweaks.sh[bu]> GOVERNOR[ondemand]: quick upthresh:21 downfac:6 minfreq:1100000 maxfreq:1500000
[ 99.854740] rpi-perftweaks.sh[bu]> RENICE dnsmasq:-5 uhttpd:2 dropbear:1
[ 99.932391] rpi-perftweaks.sh[bu]> TASKSET: nlbwmon:19382>3 collectd:17546>3 uhttpd:5256>2 odhcpd:4738>3 dropbear:25425>3 dropbear:4268>3 dnsmasq:2400>3 dnsmasq:1152>3
[ 99.967878] rpi-perftweaks.sh[bu]> MMCINTs: c:f>33
[ 99.994493] rpi-perftweaks.sh[bu]> ETH0INT: 40:1=1 41:1=1
[ 100.024956] rpi-perftweaks.sh[bu]> LANIFACE:eth0 WANIFACE:eth775c13
[ 100.051376] rpi-perftweaks.sh[bu]> sysval: eth0/queues/tx-0/xps_cpus[1=1]
[ 100.078282] rpi-perftweaks.sh[bu]> sysval: eth0/queues/tx-1/xps_cpus[2=2]
[ 100.105755] rpi-perftweaks.sh[bu]> sysval: eth0/queues/tx-2/xps_cpus[4=4]
[ 100.132396] rpi-perftweaks.sh[bu]> sysval: eth0/queues/tx-3/xps_cpus[4=4]
[ 100.159292] rpi-perftweaks.sh[bu]> sysval: eth0/queues/tx-4/xps_cpus[2=2]
[ 100.186137] rpi-perftweaks.sh[bu]> sysval: eth0/queues/rx-0/rps_cpus[7=7]
[ 100.213112] rpi-perftweaks.sh[bu]> sysval: eth775c13/queues/rx-0/rps_cpus[7=7]
### assessment
> No change in resultant values so will skip testing on older build...
> NOTE: ETH0 IRQs should/could change later as they are governed by irqbalance as requested
( but this should have always been the case so could indicate changes to that program or need to test for + actually show what it's doing in debug output )
( you mentioned needing to restart irqbalance on occation previously above could be the reason why ... "irqbalance is assigning bad cpu's for ETH0IRQ's" )
dca632 /usbstick 54°# cat /proc/interrupts | grep -E '(eth0|mmc)'
33: 23590 0 12549 0 GICv2 158 Level mmc1, mmc0
40: 4658 648 0 0 GICv2 189 Level eth0
41: 1470 0 0 558 GICv2 190 Level eth0
dca632 /usbstick 54°# cat /proc/interrupts | grep -E '(eth0|mmc)'
33: 23590 0 13045 0 GICv2 158 Level mmc1, mmc0
40: 4682 648 0 0 GICv2 189 Level eth0
41: 1477 0 0 558 GICv2 190 Level eth0
### So in the above; (ignore values less than 1000 they are from boot when 'f' was set early/default)
we can see that eth0(irq's) is probably set to 0 for both
and mmc seems to be using physical core 3(realval 2?)
(rough interpretation TBA whether this is what we want to be seeing
mmc may not be right/c?/is-noteworthy? as is cpu0 for eth?)
STEP2.2 CHECK THE RAW INTERRUPT VALUES AFTER IRQBALANCE has been running
#### grep . /proc/irq/*/smp_affinity 2>/dev/null | grep -E '(33|40|41)'
/proc/irq/33/smp_affinity:c
/proc/irq/40/smp_affinity:1
/proc/irq/41/smp_affinity:1
### assessment irqbalance has not made any changes to core affinities
(not saying this is optimal i.e. the set values could be/are obviously
causing issues for you BUT that are expected)
### (above hypothesis no problem-ish, possible issue with irqbalance
code itself between builds > TBA test on older build)
At this point unless your output / set values are significantly different to mine...
I'll avoid testing too much more as we are probably looking at; (NOTE: If more info comes to light happy to test more - and will still keep an eye out in my travels)
Some upstream/C/mainline issue with any of the underlying IRQ operations?
A "definition" issue... i.e. we may want to change our script... or you might want other underlying values or operation... but first we need to define the root cause of the slowdowns...
P.S. it's entirely possible my testing methodology above is flawed and i've overlooked some INI setting...
The other thing obviously worth keeping an eye out for are forum or github issues with irqbalance or rpi/any performance on current master...
I should probably also state that my (general) recommendation to build users is not to run irqbalance and on 600Mbs you'd be right on the limit of wanting to use PERFTWEAKS_Gbs=1
BUT
While there could be room for re-assessing interrupts in general...
...gut feeling says this is exactly as the issue title states;
that there is actually an issue with the irqbalance program itself (i.e. it's just not working? edit:when run as a service?) in this case your best options are probably revert to prev. build / 21.02 or see if PERFTWEAKS_Gbs=1 improves your performance for now?
... and i/we'll keep an eye out for irqbalance operations until better / report this in RELEASE_NOTES
fyi... oneshot worked-ish
I might test this again (strace service mode) on an older build to compare
/etc/init.d/irqbalance stop
strace irqbalance -o
TAKES SOME TIME TO APPLY...
dca632 ../PERFTWEAKS202204testing 56°# grep . /proc/irq/*/smp_affinity 2>/dev/null | grep -E '(33|40|41)'
/proc/irq/33/smp_affinity:f
/proc/irq/40/smp_affinity:2
/proc/irq/41/smp_affinity:8
so it chose (using -o = oneshot... still never changed anything when running in background); -core 2(1) for ETH0INT1(40forme) and -core 4(3) for ETH0INT2(41forme) and -f(all cores?) for MMC...
YMMV... but IMHO... that is suboptimal core selection on this device... (even moreso as perftweaks also puts some luci/non-critical stuff specifically on core 4)
... could be some scope to just add some alternate INI values for cores(eth)... so you/others can just assign/test what works well manually... (and ditch irqbalance) that was the point of userfileB just to read these values from a text file so people know whats going on... but it may/is not be ready for prime time... so maybe we add this to both...
hmmm... userfile_simple i've totally forgotten what the purpose of this is... but seems that sets alternate ETH irq's
dca632 ../PERFTWEAKS202204testing 55°# grep . /proc/irq/*/smp_affinity 2>/dev/null | grep -E '(33|40|41)'
/proc/irq/33/smp_affinity:f
/proc/irq/40/smp_affinity:1
/proc/irq/41/smp_affinity:2
so... AFAIR core1(0) and core2(1) which I think is what you will get if you use that Gbs=1 (edit: yup) parameter...
don't forget to disable irqbalance if you move to using Gbs (i.e. also in ENABLESSERVICES)
... and
assuming the above is correct, that still leaves the 'less snappy' stuff on the table (i.e. something else is causing that > after you ran userfile_simple)
which odds on are something elsewhere in the stack... obvious cantidates are qos/network stack type stuff... so we will keep an eye on this probably for a few builds or so...
sidenote on the above: rpi4.qos is getting on the stale side... I'm also using it... but, users wanting this/these levels of optimizations may wish to switch to using 'qosify' which is not available on 21.02 (and the build does not have super great defaults for) so would require alot of low/medium level setup... optimally in a collaborative / published / common way for integration / other users... (but qos is one of those things that is borderline impossible to generalize in good levels of detail) i'm not recommending you/others switch right now... but just testing with sqm disabled may yield good comparables / symptoms...
i.e. keep reverting to the better build and if you have time / energy test some newer ones if/when they come online and report back if something is snappier... (would be great but sooner or later master might become incompatible / problematic in other respects so investing lots of energy in odd stuff might not be a good use of time ;) )
... either way, I'm very grateful you reported the "less snappy on XXX build" I don't have fast enough internet to detect this, so once again, indebted for the time / energy / feedback...
Sorry for the slow reply, I foolishly sent this just before I went on holiday.
Really appreciate the detailed investigation (as always!). I swear I learn more from you dumping out your diagnosis process than years of reading forum posts and documentation.
Let me have a dig through your diagnosis and see if I can find anything fruitful, I'll try reverting too.
From a conceptual standpoint I am ambivalent to using either IRQBalance or manually pinning to cores. My only reason for running IRQBalance was that I didn't have the time to experiment with pinning, so it made more sense to let IRQBalance handle it (and it seemed to work well) - I would be more than happy to use a well pinned setup as an alternative!
I'll circle back. Many thanks again.
Well, through no effort of my own It looks like the latest release has resolved the issue for me!
Now seeing load evenly spread across all cores again (~30% per core @ 560mbps w/ irqbalance)
Just in time for my gigabit line to be installed next week 😆
I did not get any further than the same diagnoses you ran, to no enlightenment.
Were there any fixes made, or was it as simple as a newer irqbalance release?
Once again,
Thank you.
cheers for the update...
no major fixes on my end and I've not looked into the any updates to the underlying packages ( irqbalance, steering logic... maybe pi core code but unlikely... )
I suspect that it was a quirk with irqbalance initialization or similar timing/logic/interaction problem with the tweaks script...
aka (what I ask myself)
-did irqbalance itself decide on some poor runtime cpu affinity setting? (likely) was it timing related? or just a flip a coin type thing... if the former can it be tweaked either via restarting, options or startup delay... (afaik not so much except for the startup delay but that's not really going to help i think)
given how little cpu affinity shifting that needs to (and can) happen on the pi... it's better not to use irqbalance if it continues to do that (often) and gets no newer code...
basically both eth0 affinity to core0 for best latency under 600MB-ish and one-eth0 to core0, one-eth0 core1 for over 600Mb-ish
and that's 90% of the job done...
(assuming that is really the issue)
edit: appears there was no irqbalance code changes...
vert@peanut:/fs/sdd1/openwrt/RTNGext/cache/rpi4/builds$ ls
diff-r19171_r19345
diff_r19345-r19512
RELEASE_NOTES
rpi-4_21.02.3_1.3.6-2_r16554_extra/
diff-r19171_r19393
README
repolog.txt
rpi-4_snapshot_5.7.37-2_r19512_extra/
vert@peanut:/fs/sdd1/openwrt/RTNGext/cache/rpi4/builds$ grep irq diff-*
[nothing]
closing, reopen at will
The last release for me was perfect, but the current release seems to have an issue with balancing load across cores.
With the current release running IRQ balance and
PERFTWEAKS="default"
in my wrt.ini, I see almost all load hitting core 0 (it's the same with packet steering enabled or disabled)Restarting IRQ balance doesn't appear to change this (I would occasionally have to do this before).
My WAN is 600/36, and I run SQM w/ ctinfo_4layercake_rpi4.qos & irqbalance.
If I manually run the
rpi-perftweaks.sh_userfile_simple
script I am seeing the load distributed evenly again, but the performance feels less snappy than the prior release.Any suggestions? I may have misunderstood the ideal configuration for this scenario!
Thanks once again for a great build, and continuing to keep this updated :)