hamadmarri / TT-CPU-Scheduler

Task Type (TT) is an alternative CPU Scheduler for linux.
107 stars 12 forks source link

TT scheduler data usage #4

Open groobybugs opened 2 years ago

groobybugs commented 2 years ago

Hi @hamadmarri

I have been testing TT scheduler in my daily work routine for 1 complete day (no gaming related), and in general the experience was great, no lags, no hangs, etc. and the system was responsive, only one thing that I will explain later. I tested on 5.14.16-cacule and I 5.14.16 with your TT patch, I also will test the 5.15.1-tt and 5.15.1-cacule(I applied your 5.14 full patch with no problems and is working so far)

My aspects in general are:

8 x Intel Core i7-8665U @ 1.90 Ghz Ubuntu 20.04.3 LTS 32 Gb ram KDE plasma 5.18.7 zram 8GB algorithm lzo-rle

I've worked using overclocking in my laptop and always the temperature was approx 80 C.

Something I noticed using TT was that my local builds some times were double of the normal time under high process demand. I opened 2 android emulators(QEMU), ide, chrome, slack, etc. Using cacule gives me the best results when I'm doing multiple tasks (debuggin, building, jumping in meetings, etc) the compilation times are constants e.g. 4-5 min per project, and also playing some video/music on the background, using cacule and doing all of these task I only notice some times a lag in the emulators, or in latte-dock, but system is responsive,some kde desktop animations become a little slow, but I repeat only minimum lag switching between windows.

With TT I've noticed longer lags on the same apps, even in the android emulator the app I am using stops, the building in the background goes to 8-10 minutes build time, some windows freezes, and the lag in latte is more noticeable, when the background build stops the system works like a charm again, If a I have a low CPU demand everything works normally.

I have some log files using your TT script (every log is with a building in the background, 2 emulator opened and a video playing in the background) I opened kate and was frozen kate.txt

when the java builds lasted twice as long java.txt

and the emulator slowed down emulator.txt

Also run some stress test on TT stress-ng stress-ng-tt.txt

sysbench( different runs (4) take the one with more events of all of them) sysbench_tt.txt

Cacule stress-ng stress-ng-cacule.txt

sysbench sysbench-cacule.txt

and finally your responsiveness python script

responsive_cacule.txt responsive_tt.txt

To me cacule is the one with the best results in 5.14.16 for multitasking and high cpu demand, TT and cacule have the same result to me in single tasks and low cpu demand, now I'm testing 5.15.1 with your 5.14.full patch applied, I know it is for 5.14 but I wanted to tested in 5.15.

responsive_cacule_15.txt

at the moment I'm doing the same "tests" in 5.15 and I see a better performance than in 5.14.

If you need a very specific test or log do not hesitate to ask me, as soon as I can I will share it with you and as soon as I have results and commentary for 5.15 tt and cacule I will post it here.

Thanks!

hamadmarri commented 2 years ago

Hi @groobybugs

Just to confirm, was the 5.14 TT r2 or the normal one? Note that r2 has some fixes over the older TT.

I am looking at the results, thank you so much for sharing.

groobybugs commented 2 years ago

Hi @hamadmarri it was tt-xanmod-5.14-r2.patch

groobybugs commented 2 years ago

Calcule in 5.15 gives 630 events in sysbench average, I hope to share the same information with the 5.15 soon, any specific tests that I can do?

hamadmarri commented 2 years ago

Hi @groobybugs

From your feedback, I suspect that RT tasks have very high priority that make some starvation to other tasks (hence freezing). So I have some proposal solution included in the TT future plane: https://github.com/hamadmarri/TT-CPU-Scheduler#future-plan

Regarding throughput tests: I just want to stress out about that both tested kernels should have the same Hz values, please make sure that cacule and tt have the same hz values and also have almost the same .config (most importantly the nohz configurations).

Freezing issue could be related to:

I will let you know when I update TT so you can test if the freezing issue is solved.

Thank you for your valuable feedback

hamadmarri commented 2 years ago

Calcule in 5.15 gives 630 events in sysbench average, I hope to share the same information with the 5.15 soon, any specific tests that I can do?

Since TT failed the multitasking test against CacULE in your case, I would like to see TT vs CacULE in intensive single task performance like gaming or video/audio encoding tasks. Or anything that is latency bound.

Thank you

groobybugs commented 2 years ago

Hi @groobybugs

From your feedback, I suspect that RT tasks have very high priority that make some starvation to other tasks (hence freezing). So I have some proposal solution included in the TT future plane: https://github.com/hamadmarri/TT-CPU-Scheduler#future-plan

Regarding throughput tests: I just want to stress out about that both tested kernels should have the same Hz values, please make sure that cacule and tt have the same hz values and also have almost the same .config (most importantly the nohz configurations).

Freezing issue could be related to:

  • RT taking over other tasks
  • Lack of UCLAMP_TASK feature
  • Lack of proper tasks accounting and stats

I will let you know when I update TT so you can test if the freezing issue is solved.

Thank you for your valuable feedback

Hi @hamadmarri

for cacule I used the default xanmod config, and when I applied your TT r2 patch to 5.14, I also used the default config file in the xanmod repo, so same config for TT and cacule, in this case CONFIG_NO_HZ_IDLE=y, CONFIG_HZ=500 and autogroup enabled.

groobybugs commented 2 years ago

Calcule in 5.15 gives 630 events in sysbench average, I hope to share the same information with the 5.15 soon, any specific tests that I can do?

Since TT failed the multitasking test against CacULE in your case, I would like to see TT vs CacULE in intensive single task performance like gaming or video/audio encoding tasks. Or anything that is latency bound.

Thank you

sure, let me see what I can do.

Thanks!

hamadmarri commented 2 years ago

Hi @groobybugs

Could you please try this fix https://github.com/hamadmarri/TT-CPU-Scheduler/issues/5#issuecomment-968105261

It fixes the tasks accounting and stats, in case the issue is related to cpu frequ somehow.

groobybugs commented 2 years ago

Hi @hamadmarri

I applied your patch on cacule 15-tt branch and i set TT_ACCOUNTING_STATS to n, because i use performance governor, I will test this value enabled later.

I tested the scheduler with the patch and works great, the system was very responsive all the time, under heavy load and multiprocessing, but like the previous test, in my local builds the times under multitasks were:

Cacule 7 minutes TT 9 minutes TT patch 11 minutes

but in my experience the most responsive scheduler is in this order

1.-TT with patch 2.- Cacule 3.- normal TT

and as you asked me I ran some benchmarks using phoronix-test-suite for blender and xonotic

these are the results specs Processor: Intel Core i7-8665U @ 4.80GHz (4 Cores / 8 Threads), Motherboard: Dell 07WDVW (1.14.0 BIOS), Chipset: Intel Cannon Point-LP, Memory: 32GB, Disk: SK hynix PC601 NVMe 512GB, Graphics: Intel UHD 620 WHL GT2 3GB (1150MHz), Audio: Realtek ALC3254, Network: Intel Cannon Point-LP CNVi, cpu-scaling-governor: intel_pstate performance , cpu-microcode : 0xea

xonotic

12 runs

image

xonotic tt 12 sched OS: Ubuntu 20.04, Kernel: 5.15.2-xanmod1-tt-tt-fix (x86_64), Desktop: KDE Plasma 5.18.7, Display Server: X Server 1.20.11, OpenGL: 4.6 Mesa 21.0.3, Vulkan: 1.2.145, Compiler: GCC 9.3.0, File-System: ext4, Screen Resolution: 1920x1080

FPS: 79.1633005: 78.3352968: 74.0823467: 72.2815212: 71.8177972: 71.6806862: 70.3664129: 70.7046619: 70.3465052: 70.1676165: 70.7141228: 70.286941

xonotic 12 runs tt scheduler Kernel: 5.15.2-xanmod1-tt FPS: 79.7947663: 79.208414: 75.4100481: 72.8958173: 72.649925: 72.0702237: 72.147732: 71.6166447: 71.6933495: 71.3670533: 71.1901418: 71.2663421

xonotic cacule Kernel: 5.15.2-xanmod1-cacule-full (x86_64)

FPS: 79.9925776: 78.9416091: 75.7230745: 73.5371781: 73.2700679: 73.1542655: 73.2225431: 72.8344118: 72.7633519: 72.7872554: 72.9407657: 68.1741742

Blender

3 runs

image

Intel Core i7-8665U == Kernel: 5.15.2-xanmod1-tt-tt-fix

hamadmarri commented 2 years ago

Hi @groobybugs

Thank you for the tests. Could you please attach the two .configs for cacule and tt.

I have updated the TT patch since the last one I have sent hear, which one have you tried? The last commit was yesterday which has sig. improvement because of considering cache hot tasks (ported from cfs)

groobybugs commented 2 years ago

Sure, I will attach the config files and the patch I tried was this one https://github.com/hamadmarri/TT-CPU-Scheduler/issues/5#issuecomment-968117884

hamadmarri commented 2 years ago

Sure, I will attach the config files and the patch I tried was this one #5 (comment)

Yes, this patch has no improvements for performance, it only fixes the freq. scaling issues.

You might try the latest commit: https://github.com/hamadmarri/TT-CPU-Scheduler/blob/4fd4a9a29c8cb7c05e22df49514de304ea66afeb/patches/5.15/tt-5.15-r2.patch

For compiling measurements, in case you have realtime task is running like youtube vid. or audio, TT will give more preferences to realtime tasks than cpu/io bound tasks like compiling. So, it is normal to see the build time is higher, but more importantly, the FPS or frame drops in realtime task is almost 0%.

groobybugs commented 2 years ago

ok @hamadmarri tomorrow I will test with the latest commit and this are my configs files, and yeap tI did not notice any slowdowns in the emulators or the system at any time.

config (cacule).txt config (tt).txt

and thanks!!!

hamadmarri commented 2 years ago

ok @hamadmarri tomorrow I will test with the latest commit and this are my configs files, and yeap tI did not notice any slowdowns in the emulators or the system at any time.

config (cacule).txt config (tt).txt

and thanks!!!

Hi @groobybugs Thank you so much for you efforts, here are some notes related to benchmarks:

The rest of configs are identical :+1:

Side note: You have NUMA_BALANCE=y in both cacule and tt, you maybe don't need numa at all. Check with numactl command whether you have only 1 node or more. In case of 1 node, you don't need any NUMA configs enabled. Disabling numa can save some overheads if your machine has only 1 node.

Thank you

groobybugs commented 2 years ago

Hi @hamadmarri!!, I really sorry I did not notice the differences I thought I was using 500 hz for both, my bad. I will do the test again and enable CONFIG_TT_ACCOUNTING_STATS, thanks!!!

groobybugs commented 2 years ago

Hi @hamadmarri

Now I'm completly sure thaht I used the same configuration and also disabled Numa and enable CONFIG_TT_ACCOUNTING_STATS I used 500 hz for all my test.

image

cacule 78.889229:79.5157125:79.6588391:79.0820449:78.2657684:77.5217652:76.4722857:76.6737493:75.5381214:74.9619431

TT 80.0675823:79.7179842:79.9550305:79.3849837:79.6759446:79.0674221:79.0211555:78.9002892:78.4843618:78.1941778

as we can see TT has better performance over all the test

image

and the diff in blender is small actually

Using TT as I said before es very responsive in multitask and for single task has great results, something I do to improve my compilation time, was increase the niceness of that task I more interested in and for example if the build was 11 minutes, increasing the niceness reduce the time to 5 minutes, and the system is still responsive.

I general I'm going to say that TT is a great Scheduler, thanks man! this speeds up my daily work

groobybugs commented 2 years ago

I hope this information is useful to you, and if there is anything else I can help you with, just ask man

hamadmarri commented 2 years ago

I am glad to see that TT is performing well in your cases :+1:

Any feedback are welcome

Thank you so much @groobybugs