warpme / miniarch

ArchLinux ARM SD card images for quick start with ArchLinux on ARM SBC & TV Boxes
GNU General Public License v2.0
92 stars 8 forks source link

Alwinner h313/h616 : thermal zones report (fixed) ; cpu freq scaling (fixed) & reporting (stuck to max; not caused by miniarch ) #11

Closed mercs759 closed 1 year ago

mercs759 commented 1 year ago

My tvbox specs and miniarch installation procedure are in issue #9

I put 2 different issues here , since they could be connected

1) thermal sensors need calibration : Before installing miniarch I did some HW tests under android 10 (default firmware for my box) using CPUZ and another software :

under Android, with cpuload around 30%: cpu temp reported by system is about 60°C, box seems warm. 60°C seems compatible with box warmth

After installing miniarch 202302, with cpuload at 5%: cpu heatsink is almost cold, but system reports very high temperatures

cat /sys/devices/virtual/thermal/thermal_zone0/temp 84582 (84,58°C) cat /sys/devices/virtual/thermal/thermal_zone1/temp 85307 cat /sys/devices/virtual/thermal/thermal_zone2/temp 84743 cat /sys/devices/virtual/thermal/thermal_zone3/temp 85791 ( the four "zones" are 0=CPU, 1=GPU, 2=RAM , 3=something else (XD) )

2) cpu scaling/governor seems broken

I did this: governor set to "performance"

install s-tui: in s-tui's "monitoring mode" (about 5% cpu used) look at cpu's frequency : CPUfreq moves from 480mhz to about 1000mhz, seems normal

BUT in s-tui's "test mode" (over 95% cpu used) cpu is stuck to 480mhz... when high load stops... cpu frequency moves up again. Seems no sense.

This strange behaviour is also confirmed by running an intensive CPU task in a terminal window (try a command like "7z b") while monitoring with s-tui. Installing Endeavour (headless) doesn't fix this.

warpme commented 1 year ago

Hi, Recently i added support for speed bins (cpu dvfs vcc ranges dependency on chip fabrication quality; 359e048d4e1c760e6df986802de57a70fd60273f 2a7f33bb262b17ddd3d8716c96e5f190c8e1fbce ) May you update kernel (run start then press 4)

btw: kernel not loads cpu dvfs module automatically - so you need run modprobe sun50i_cpufreq_nvmem

mercs759 commented 1 year ago

going from 6.2 to 6.3

login, su root, "start", "option 4" (installs 6.3) , reboot modprobe sun50i_cpufreq_nvmem

rechecking values

cpupower frequency-info

available governors: performance and schedutil (as under 6.2) scaling governor : performance (as under 6.2) scaling_driver : cpufreq_dt (as under 6.2)

6.3 : available frequency steps: 480 MHz, 600 MHz, 792 MHz, 1.01 GHz, 1.20 GHz, 1.51 GHz (6.2 : available frequency steps: 480 MHz, 600 MHz, 792 MHz, 1.01 GHz, 1.20 GHz, 1.34 GHz, 1.51 GHz)

scaling maxfreq : 480mhz (?, I don't remember what it was before upgrading. I will change this to 1512000 later ) scaling min freq : 480mhz (as under 6.2)

testing with s-tui

1)reported cpu temp still needs calibration, BUT now it seems to respond to changes in cpu load. about 2° up if load goes to 100% , 2° down when cpuLoad goes down. Maybe it's just an impression.

2) cpu frequency has still the same problem : when cpu load is high (avg 100%, 4 cores running at 100% load, s-tui test mode) cpufreq is still (as in 6.2) reported stuck at 480mhz. But if cpuload is low (avg <10%, just 1 core working , working core load about 30%, others 0%) cpufreq moves freely (circa once a second) between 480000 , 600000, 792000 and 1008000. (maybe it somethimes goes up to 1512, but I didn't see that

Now, tryed to echo 1152000 > /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq

didn't work :/ remained at 480000 cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq 480000 did again, nothing.

so I uncommented /etc/default/cpupower min_freq and max_freq values, put correct values in them (480MHz and 1512MHz)

reboot

login, again "modprobe sun50i_cpufreq_nvmem"

i re-checked min_freq and max_freq values in /etc/default/cpupower. Ok, they have been set.

trying again echo 1152000 > /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq

nothing:

cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq 480000

Didn't know what else to do so I started a series of repeated (once a second, or so) cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq

and I got lots of 480000 answers, BUT also some other values, like 600000, and then again lots of 480000

I don't understand... system continuously changes scaling_max_freq???

then I started a series of repeated (once a second, or so)

cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_cur_freq

and, yep, cur_freq behaves the same as max_freq: lots of 480000 and some other value sometimes.

warpme commented 1 year ago

I think i found root cause issue. I prepared updated 6.3.1 kernel and upload it to miniarch repo may you try update kernel? (option 4 in 'start') after installing new kernel, modprobe sun50i_cpufreq_nvmem - then pls report cpu temps + freqs?

also: what you get bydmesg | grep axp

mercs759 commented 1 year ago

resolving dependencies... looking for conflicting packages...

Packages (2) linux-aarch64-6.3.1-2 linux-aarch64-api-headers-6.3.1-2

Total Download Size: 36.92 MiB Total Installed Size: 104.02 MiB Net Upgrade Size: 0.00 MiB

:: Proceed with installation? [Y/n]

:: Retrieving packages... linux-aarch64-6.3.1-2-any 35.4 MiB 2.88 MiB/s 00:12 [########################################################] 100% linux-aarch64-api-headers-6.3.1-2-any 1530.3 KiB 2.10 MiB/s 00:01 [########################################################] 100% Total (2/2) 36.9 MiB 2.81 MiB/s 00:13 [########################################################] 100% (2/2) checking keys in keyring [########################################################] 100% (2/2) checking package integrity [########################################################] 100% (2/2) loading package files [########################################################] 100% (2/2) checking for file conflicts [########################################################] 100% (2/2) checking available disk space [########################################################] 100% :: Processing package changes... (1/2) upgrading linux-aarch64 [########################################################] 100% (2/2) upgrading linux-aarch64-api-headers [########################################################] 100% :: Running post-transaction hooks... (1/4) Arming ConditionNeedsUpdate... (2/4) Updating module dependencies... (3/4) Updating linux-aarch64 module dependencies... (4/4) Updating linux-aarch64 initcpios... ==> Building image from preset: /etc/mkinitcpio.d/linux-aarch64.preset: 'default' -> -k 6.3.1 -c /etc/mkinitcpio.conf -g /boot/initramfs-linux.img ==> Starting build: '6.3.1' -> Running build hook: [base] -> Running build hook: [udev] -> Running build hook: [autodetect] -> Running build hook: [modconf] -> Running build hook: [kms] -> Running build hook: [keyboard] -> Running build hook: [keymap] -> Running build hook: [consolefont] ==> WARNING: consolefont: no font found in configuration -> Running build hook: [block] -> Running build hook: [filesystems] -> Running build hook: [fsck] ==> Generating module dependencies ==> Creating gzip-compressed initcpio image: '/boot/initramfs-linux.img' ==> Image generation successful ==> Building image from preset: /etc/mkinitcpio.d/linux-aarch64.preset: 'fallback' -> -k 6.3.1 -c /etc/mkinitcpio.conf -g /boot/initramfs-linux-fallback.img -S autodetect ==> Starting build: '6.3.1' -> Running build hook: [base] -> Running build hook: [udev] -> Running build hook: [modconf] -> Running build hook: [kms] -> Running build hook: [keyboard] -> Running build hook: [keymap] -> Running build hook: [consolefont] ==> WARNING: consolefont: no font found in configuration -> Running build hook: [block] ==> WARNING: Possibly missing firmware for module: 'ums_eneub6250' -> Running build hook: [filesystems] -> Running build hook: [fsck] ==> Generating module dependencies ==> Creating gzip-compressed initcpio image: '/boot/initramfs-linux-fallback.img' ==> Image generation successful

System reboot is required!

pressed "any other key", and did a manual "reboot"

login ; modprobe sun50i_cpufreq_nvmem

[root@alarmhost ~]# cpupower frequency-info analyzing CPU 1: driver: cpufreq-dt CPUs which run at the same hardware frequency: 0 1 2 3 CPUs which need to have their frequency coordinated by software: 0 1 2 3 maximum transition latency: 244 us hardware limits: 480 MHz - 1.51 GHz available frequency steps: 480 MHz, 600 MHz, 792 MHz, 1.01 GHz, 1.20 GHz, 1.51 GHz available cpufreq governors: performance schedutil current policy: frequency should be within 480 MHz and 1.51 GHz. The governor "performance" may decide which speed to use within this range. current CPU frequency: 1.51 GHz (asserted by call to hardware)

thermal zones started at about 47°C (ambient in my house now is 21°, h313 has a small aluminium heatsink, kept in place with a thermal pad, in a closed box)

after 5 minutes with cpu average load at 5% thermal zones are like this:

[root@alarmhost ~]# sensors cpu_thermal-virtual-0 Adapter: Virtual device temp1: +49.4°C

gpu_thermal-virtual-0 Adapter: Virtual device temp1: +49.9°C

ddr_thermal-virtual-0 Adapter: Virtual device temp1: +49.3°C

ve_thermal-virtual-0 Adapter: Virtual device temp1: +49.7°C

everything seems consistent and good.

test [root@alarmhost ~]# 7z b

7-Zip [64] 17.04 : Copyright (c) 1999-2021 Igor Pavlov : 2017-08-28 p7zip Version 17.04 (locale=en_US.UTF-8,Utf16=on,HugeFiles=on,64 bits,4 CPUs LE)

LE CPU Freq: 64000000 - - - - - - - -

RAM size: 1982 MB, # CPU hardware threads: 4 RAM usage: 882 MB, # Benchmark threads: 4

                   Compressing  |                  Decompressing

Dict Speed Usage R/U Rating | Speed Usage R/U Rating -----KiB/s % MIPS MIPS | KiB/s % MIPS MIPS

22: 2099 306 667 2042 | 60728 380 1362 5181 23: 1981 308 656 2018 | 58175 373 1348 5034 24: 1945 319 655 2091 | 57623 379 1333 5058 25: 1805 311 662 2062 | 54475 370 1310 4848

---------------------------------- | ------------------------------

Avr: 311 660 2053 | 376 1338 5030 Tot: 343 999 3542

"7z b" test completed in 125 seconds, during test cpu temp went up to max 65° 30 seconds after test, temperature went back to about 54° box is cold, this is a bit strange, but ok

before, after and during test , s-tui always indicated 1512Mhz.

I would say that reported cpufreq never moves from 1512. maybe frequency reporting is still bugged, but system seems to work as it should.

++++++++++++++++++ ++ MY COMPLIMENTS +++ ++++++++++++++++++ It seems you did it. ;D

mercs759 commented 1 year ago

[root@alarmhost ~]# dmesg | grep axp [ 2.221285] axp20x-rsb sunxi-rsb-745: AXP20x variant AXP806 found [ 2.229419] axp20x-rsb sunxi-rsb-745: AXP20X driver loaded

(chip is an AXP305, but axp20x is the same driver used by android stock rom. It should be ok)

adyatan commented 1 year ago

Wanted to share some info. I have a T95 Mini with H313. I used the Tanix TX6S image and it works perfectly for my use. Saw here that the CPU frequency wasn't been displayed correctly. For me it shows 1.608 GHz by default. After I do modprobe sun50i_cpufreq_nvmem it falls to 1.51 GHz.

mercs759 commented 1 year ago

@adyatan If reported CPU speed is 1608 please do a test open 2 ssh or local sessions in one ssh install p7zip ; after installing do 7z b . wait about 50 seconds, then, in the other window, and while 7z b is still running do a cat /sys/class/thermal/thermal_zone*/temp

then report 7z b results So we'll see if it's a "real" 1608 (it could be possible).

adyatan commented 1 year ago

@mercs759 seems like 1.608 GHz was not real. Here is the benchmark at 1.608

7-Zip [64] 17.04 : Copyright (c) 1999-2021 Igor Pavlov : 2017-08-28
p7zip Version 17.04 (locale=C,Utf16=off,HugeFiles=on,64 bits,4 CPUs LE)

LE
CPU Freq: 64000000 - - - - 256000000 - - -

RAM size:    1982 MB,  # CPU hardware threads:   4
RAM usage:    882 MB,  # Benchmark threads:      4

                       Compressing  |                  Decompressing
Dict     Speed Usage    R/U Rating  |      Speed Usage    R/U Rating
         KiB/s     %   MIPS   MIPS  |      KiB/s     %   MIPS   MIPS

22:       1070   221    471   1041  |      35617   393    773   3039
23:       1408   311    462   1435  |      34944   393    770   3024
24:       1395   320    470   1501  |      34348   393    767   3015
25:       1353   324    477   1545  |      33307   391    758   2964
----------------------------------  | ------------------------------
Avr:             294    470   1381  |              392    767   3011
Tot:             343    619   2196

The Temperature never rose above 60-62°.

Here is the benchmark after the frequency is set to 1.51GHz.

7-Zip [64] 17.04 : Copyright (c) 1999-2021 Igor Pavlov : 2017-08-28
p7zip Version 17.04 (locale=C,Utf16=off,HugeFiles=on,64 bits,4 CPUs LE)

LE
CPU Freq: - - - - - - - - -

RAM size:    1982 MB,  # CPU hardware threads:   4
RAM usage:    882 MB,  # Benchmark threads:      4

                       Compressing  |                  Decompressing
Dict     Speed Usage    R/U Rating  |      Speed Usage    R/U Rating
         KiB/s     %   MIPS   MIPS  |      KiB/s     %   MIPS   MIPS

22:       2240   324    673   2179  |      62979   393   1369   5373
23:       2225   343    661   2268  |      61192   391   1353   5295
24:       1992   321    666   2142  |      59968   393   1341   5264
25:       1895   323    670   2164  |      57859   391   1316   5149
----------------------------------  | ------------------------------
Avr:             328    668   2188  |              392   1344   5270
Tot:             360   1006   3729

This seems much better. The temperature also rose to 67-68° at max. Also pretty weird it doesn't show the CPU Freq: this time.

mercs759 commented 1 year ago

Last results seem very good! 7zip is not accurate in reporting cpufreq, to know CPU current freq Just use cat followed by your path to values

mercs759 commented 1 year ago

so, it seems that just cpu_current_speed reporting is to be fixed.

One other interesting thing: my stock android10 rom makes my box warm just 2 minutes after boot, doing nothing. on the contrary, miniarch , even during stress testing doesn't seem to generate so much heat.

Max frequency should be max 1512Mhz for both android 10 and miniarch6.3.1. maybe android uses an higher voltage on cpu? or maybe gpu makes lots of heat in android doing very little? I'll do some testing.

I' m looking for a benchmark that runs both on android and on arch (possibly CLI). If u have something, please let me know.

update: First tests seem to say that stock Android10 makes CPU run at a fixed 12xxMhz , this could explain temperature (Always >55°C).

7z b (running under termux0188) seems to confirm that CPU frequency is below 1521Mhz

mercs759 commented 1 year ago

ok, It seems that "current CPU frequency stuck to max " is not only a miniarch / Alwinner/6.2/6.3 problem.

I did a test on another arm64 tvbox:

s905x soc (Classic "p212-like" board) armbian bullseye (last stable), Governor: performance 100-1512Mhz

in s-tui reported current CPU freq is always = max CPU freq (1512)

MONITORING (CPU load < 5% cpu freq "1512" temp 39°

TESTING (CPU load 100%) cpufreq, still "1512" BUT temp +10° within 20 seconds. So ... during testing cpufreq obviously goes up from X to 1512 , but system doesn't report correctly.

I will do some other tests.

takase1121 commented 1 year ago

A sidenote for manually modprobing sun50i-cpufreq-nvmem, maybe it can be added to the MODULES section of mkinitcpio.conf so it'll be loaded on startup.

warpme commented 1 year ago

maybe it can be added to the MODULES section of mkinitcpio.conf - well - i wish i can do that.... Unfortunately there is issue with kernel dependencies for this module manifesting kernel trap when this module is loading by /etc/modprobe.d This is the reason why this module needs to be loaded manually or automatically with some delay (this is what i'm doing in minimyth2 Any help with investigation of issue root cause (with pull request) is much welcomed!