Open groobybugs opened 2 years ago
Hi @groobybugs
Just to confirm, was the 5.14 TT r2
or the normal one? Note that r2 has some fixes over the older TT.
I am looking at the results, thank you so much for sharing.
Hi @hamadmarri it was tt-xanmod-5.14-r2.patch
Calcule in 5.15 gives 630 events in sysbench average, I hope to share the same information with the 5.15 soon, any specific tests that I can do?
Hi @groobybugs
From your feedback, I suspect that RT tasks have very high priority that make some starvation to other tasks (hence freezing). So I have some proposal solution included in the TT future plane: https://github.com/hamadmarri/TT-CPU-Scheduler#future-plan
Regarding throughput tests: I just want to stress out about that both tested kernels should have the same Hz values, please make sure that cacule and tt have the same hz values and also have almost the same .config (most importantly the nohz configurations).
Freezing issue could be related to:
UCLAMP_TASK
featureI will let you know when I update TT so you can test if the freezing issue is solved.
Thank you for your valuable feedback
Calcule in 5.15 gives 630 events in sysbench average, I hope to share the same information with the 5.15 soon, any specific tests that I can do?
Since TT failed the multitasking test against CacULE in your case, I would like to see TT vs CacULE in intensive single task performance like gaming or video/audio encoding tasks. Or anything that is latency bound.
Thank you
Hi @groobybugs
From your feedback, I suspect that RT tasks have very high priority that make some starvation to other tasks (hence freezing). So I have some proposal solution included in the TT future plane: https://github.com/hamadmarri/TT-CPU-Scheduler#future-plan
Regarding throughput tests: I just want to stress out about that both tested kernels should have the same Hz values, please make sure that cacule and tt have the same hz values and also have almost the same .config (most importantly the nohz configurations).
Freezing issue could be related to:
- RT taking over other tasks
- Lack of
UCLAMP_TASK
feature- Lack of proper tasks accounting and stats
I will let you know when I update TT so you can test if the freezing issue is solved.
Thank you for your valuable feedback
Hi @hamadmarri
for cacule I used the default xanmod config, and when I applied your TT r2 patch to 5.14, I also used the default config file in the xanmod repo, so same config for TT and cacule, in this case CONFIG_NO_HZ_IDLE=y, CONFIG_HZ=500 and autogroup enabled.
Calcule in 5.15 gives 630 events in sysbench average, I hope to share the same information with the 5.15 soon, any specific tests that I can do?
Since TT failed the multitasking test against CacULE in your case, I would like to see TT vs CacULE in intensive single task performance like gaming or video/audio encoding tasks. Or anything that is latency bound.
Thank you
sure, let me see what I can do.
Thanks!
Hi @groobybugs
Could you please try this fix https://github.com/hamadmarri/TT-CPU-Scheduler/issues/5#issuecomment-968105261
It fixes the tasks accounting and stats, in case the issue is related to cpu frequ somehow.
Hi @hamadmarri
I applied your patch on cacule 15-tt branch and i set TT_ACCOUNTING_STATS to n, because i use performance governor, I will test this value enabled later.
I tested the scheduler with the patch and works great, the system was very responsive all the time, under heavy load and multiprocessing, but like the previous test, in my local builds the times under multitasks were:
Cacule 7 minutes TT 9 minutes TT patch 11 minutes
but in my experience the most responsive scheduler is in this order
1.-TT with patch 2.- Cacule 3.- normal TT
and as you asked me I ran some benchmarks using phoronix-test-suite for blender and xonotic
these are the results specs Processor: Intel Core i7-8665U @ 4.80GHz (4 Cores / 8 Threads), Motherboard: Dell 07WDVW (1.14.0 BIOS), Chipset: Intel Cannon Point-LP, Memory: 32GB, Disk: SK hynix PC601 NVMe 512GB, Graphics: Intel UHD 620 WHL GT2 3GB (1150MHz), Audio: Realtek ALC3254, Network: Intel Cannon Point-LP CNVi, cpu-scaling-governor: intel_pstate performance , cpu-microcode : 0xea
xonotic
12 runs
xonotic tt 12 sched OS: Ubuntu 20.04, Kernel: 5.15.2-xanmod1-tt-tt-fix (x86_64), Desktop: KDE Plasma 5.18.7, Display Server: X Server 1.20.11, OpenGL: 4.6 Mesa 21.0.3, Vulkan: 1.2.145, Compiler: GCC 9.3.0, File-System: ext4, Screen Resolution: 1920x1080
FPS: 79.1633005: 78.3352968: 74.0823467: 72.2815212: 71.8177972: 71.6806862: 70.3664129: 70.7046619: 70.3465052: 70.1676165: 70.7141228: 70.286941
xonotic 12 runs tt scheduler Kernel: 5.15.2-xanmod1-tt FPS: 79.7947663: 79.208414: 75.4100481: 72.8958173: 72.649925: 72.0702237: 72.147732: 71.6166447: 71.6933495: 71.3670533: 71.1901418: 71.2663421
xonotic cacule Kernel: 5.15.2-xanmod1-cacule-full (x86_64)
FPS: 79.9925776: 78.9416091: 75.7230745: 73.5371781: 73.2700679: 73.1542655: 73.2225431: 72.8344118: 72.7633519: 72.7872554: 72.9407657: 68.1741742
Blender
3 runs
Intel Core i7-8665U == Kernel: 5.15.2-xanmod1-tt-tt-fix
Hi @groobybugs
Thank you for the tests. Could you please attach the two .configs for cacule and tt.
I have updated the TT patch since the last one I have sent hear, which one have you tried? The last commit was yesterday which has sig. improvement because of considering cache hot tasks (ported from cfs)
Sure, I will attach the config files and the patch I tried was this one https://github.com/hamadmarri/TT-CPU-Scheduler/issues/5#issuecomment-968117884
Sure, I will attach the config files and the patch I tried was this one #5 (comment)
Yes, this patch has no improvements for performance, it only fixes the freq. scaling issues.
You might try the latest commit: https://github.com/hamadmarri/TT-CPU-Scheduler/blob/4fd4a9a29c8cb7c05e22df49514de304ea66afeb/patches/5.15/tt-5.15-r2.patch
For compiling measurements, in case you have realtime task is running like youtube vid. or audio, TT will give more preferences to realtime tasks than cpu/io bound tasks like compiling. So, it is normal to see the build time is higher, but more importantly, the FPS or frame drops in realtime task is almost 0%.
ok @hamadmarri tomorrow I will test with the latest commit and this are my configs files, and yeap tI did not notice any slowdowns in the emulators or the system at any time.
config (cacule).txt config (tt).txt
and thanks!!!
ok @hamadmarri tomorrow I will test with the latest commit and this are my configs files, and yeap tI did not notice any slowdowns in the emulators or the system at any time.
config (cacule).txt config (tt).txt
and thanks!!!
Hi @groobybugs Thank you so much for you efforts, here are some notes related to benchmarks:
tt <- CONFIG_HZ_1000=y
where cacule <- CONFIG_HZ_500=y
. This will certainly lead to better results to cacule in performance tests. High Hz value is a trade of between latency and throughput. That's why cacule shows better results in your phoronix tests. And actually the difference in Hz between 1000hz and 500hz is huge (2x).CONFIG_TT_ACCOUNTING_STATS
even though you are using performance
governor, It seems that the overhead is tiny (few nanoseconds per tick!) but on the other hand, I am not 100% aware if perfromance
governor doesn't need any utils update. To be safe I would recommend to enable it.The rest of configs are identical :+1:
Side note:
You have NUMA_BALANCE=y in both cacule and tt, you maybe don't need numa at all. Check with numactl
command whether you have only 1 node or more. In case of 1 node, you don't need any NUMA configs enabled. Disabling numa can save some overheads if your machine has only 1 node.
Thank you
Hi @hamadmarri!!, I really sorry I did not notice the differences I thought I was using 500 hz for both, my bad. I will do the test again and enable CONFIG_TT_ACCOUNTING_STATS
, thanks!!!
Hi @hamadmarri
Now I'm completly sure thaht I used the same configuration and also disabled Numa and enable CONFIG_TT_ACCOUNTING_STATS
I used 500 hz for all my test.
cacule 78.889229:79.5157125:79.6588391:79.0820449:78.2657684:77.5217652:76.4722857:76.6737493:75.5381214:74.9619431
TT 80.0675823:79.7179842:79.9550305:79.3849837:79.6759446:79.0674221:79.0211555:78.9002892:78.4843618:78.1941778
as we can see TT has better performance over all the test
and the diff in blender is small actually
Using TT as I said before es very responsive in multitask and for single task has great results, something I do to improve my compilation time, was increase the niceness of that task I more interested in and for example if the build was 11 minutes, increasing the niceness reduce the time to 5 minutes, and the system is still responsive.
I general I'm going to say that TT is a great Scheduler, thanks man! this speeds up my daily work
I hope this information is useful to you, and if there is anything else I can help you with, just ask man
I am glad to see that TT is performing well in your cases :+1:
Any feedback are welcome
Thank you so much @groobybugs
Hi @hamadmarri
I have been testing TT scheduler in my daily work routine for 1 complete day (no gaming related), and in general the experience was great, no lags, no hangs, etc. and the system was responsive, only one thing that I will explain later. I tested on 5.14.16-cacule and I 5.14.16 with your TT patch, I also will test the 5.15.1-tt and 5.15.1-cacule(I applied your 5.14 full patch with no problems and is working so far)
My aspects in general are:
8 x Intel Core i7-8665U @ 1.90 Ghz Ubuntu 20.04.3 LTS 32 Gb ram KDE plasma 5.18.7 zram 8GB algorithm lzo-rle
I've worked using overclocking in my laptop and always the temperature was approx 80 C.
Something I noticed using TT was that my local builds some times were double of the normal time under high process demand. I opened 2 android emulators(QEMU), ide, chrome, slack, etc. Using cacule gives me the best results when I'm doing multiple tasks (debuggin, building, jumping in meetings, etc) the compilation times are constants e.g. 4-5 min per project, and also playing some video/music on the background, using cacule and doing all of these task I only notice some times a lag in the emulators, or in latte-dock, but system is responsive,some kde desktop animations become a little slow, but I repeat only minimum lag switching between windows.
With TT I've noticed longer lags on the same apps, even in the android emulator the app I am using stops, the building in the background goes to 8-10 minutes build time, some windows freezes, and the lag in latte is more noticeable, when the background build stops the system works like a charm again, If a I have a low CPU demand everything works normally.
I have some log files using your TT script (every log is with a building in the background, 2 emulator opened and a video playing in the background) I opened kate and was frozen kate.txt
when the java builds lasted twice as long java.txt
and the emulator slowed down emulator.txt
Also run some stress test on TT stress-ng stress-ng-tt.txt
sysbench( different runs (4) take the one with more events of all of them) sysbench_tt.txt
Cacule stress-ng stress-ng-cacule.txt
sysbench sysbench-cacule.txt
and finally your responsiveness python script
responsive_cacule.txt responsive_tt.txt
To me cacule is the one with the best results in 5.14.16 for multitasking and high cpu demand, TT and cacule have the same result to me in single tasks and low cpu demand, now I'm testing 5.15.1 with your 5.14.full patch applied, I know it is for 5.14 but I wanted to tested in 5.15.
responsive_cacule_15.txt
at the moment I'm doing the same "tests" in 5.15 and I see a better performance than in 5.14.
If you need a very specific test or log do not hesitate to ask me, as soon as I can I will share it with you and as soon as I have results and commentary for 5.15 tt and cacule I will post it here.
Thanks!