maemo-leste / bugtracker

Issue tracking repository
62 stars 3 forks source link

N900: Try to hit OFF mode (low power consumption) #545

Open MerlijnWajer opened 3 years ago

MerlijnWajer commented 3 years ago

Creating this ticket by popular demand, also with some description from IRC

12:51 < Wizzup> ^-^hi: so the answer is that it's not just a question of 'merging in some code', it's a decent amount of research work, let me explain
12:52 < Wizzup> Our current Linux on the N900 is 5.1 with some patches, some of our own, some powervr patches, for omapfb and powervr. The tweet was plain mainline (5.8 or so?) with no maemo leste loaded - minimal userspace, showing that it is possible to enter those idle/off modes
12:52 < Wizzup> What needs to be done to get it all in place and merged is: (1) move from kernel 5.1 to a newer/latest kernel - this is being blocked by some of us not knowing how to port to 5.2 and forward due to some kernel change
12:53 < Wizzup> (2) is then loading up minimal userspace, hitting off mode with the powervr driver loaded but not used, and then booting maemo bits one at a time, and see what can prevent OFF mode from being hit
12:53 < Wizzup> (3) making n900 on maemo play nice with off mode once (1) and (2) are done
12:54 < Wizzup> We're not doing (1) at the moment because we want to switch to another, newer, powervr userspace + kernel driver so that we're using the same one on all devices, but that driver needs some more work to get X11 going with hildon-desktop
12:54 < Wizzup> If we can get the new driver working, we can skip (1) outright, and since we want to do that anyway, it's better to aim for the newer driver rather than having to redo the work we just did to begin with
12:55 < Wizzup> A shortcut to start on (2) and (3) is to load the latest kernel without powervr on the n900 and start working on OFF mode, without 3d acceleration, this can be useful, but it won't result in a useful phone until it's once again equipped with the 3d driver
12:56 < Wizzup> One other shortcut is that maybe it is possible to try to hit off mode on kernel 5.1, but if we can't, we cannot report bugs since everyone (rightfully so) will tell us: does it work with latest kernel? as first question
12:57 < Wizzup> Hope this clarifies it a bit, please let me know if you have more questions
12:58 < ^-^hi> any way to run new software on fremantle without compiling everything and its dependencies?
13:00  * ^-^hi 's English is degrading
13:01 < Wizzup> If you fremantle you mean the ancient kernel and ancient userspace, then likely only chroot will be easy, but even that will likely have trouble with really old kernel
13:01 < sicelo> Not really enough useful ways for that.
13:02 < ^-^hi> didn't linux have a policy of "don't break userspace"?
13:02 < Wizzup> Maybe if you also go for arm eabi, and use the space x11 driver ported to latest xorg server and using eabi in the chroot, you could get something going, but that by itself is probably more effort than the work required to get decent PM on leste
13:04 < sicelo> ^-^hi: if you want to help with the off mode issue (and get some use out of the device in the meantime), maybe look at running plain devuan with i3 or similar light UI (no 3d required). Should make it easier to play with kernel, and still use device to some extent
13:05 < ^-^hi> i know C but don't know anything about kernel and such stuff
13:05 < sicelo> If you know C, you're already halfway :-)
13:07 < sicelo> I don't know C, but already 'fixed' something for the LED driver ... so you should do even better. Also, the PM stuff doesn't necessarily need you to do actual dev. Testing, reporting, testing some more, etc. is still a good start
13:07 < Wizzup> sicelo: I'm not sure if that helps that much (with i3), it's mostly kernel stuff really
13:08 < Wizzup> Don't mean to discourage, but I think if folks want to contribute without the powervr stuff, then picking the last path I outlined is helpful: latest mainline linux without powervr, and making things play nice then
13:08 < Wizzup> you could do that with i3, but might as well do it with maemo tbh
MerlijnWajer commented 2 years ago

Relevant links:

MerlijnWajer commented 2 years ago

I can currently hit RET with our 5.15 kernel and this script:

# cat setup-idle.sh
#!/bin/bash
mount -t proc none /proc
mount -t sysfs none /sys
mount -t debugfs none /sys/kernel/debug
mount -o rw,remount /

consoles=$(find /sys/bus/platform/devices/4*.serial/ -name console)
for console in ${consoles}; do
    echo N > ${console}
done

# Enable autosuspend
uarts=$(find /sys/bus/platform/devices/4*.serial/power/ -type d)
for uart in ${uarts}; do
    echo 2000 > ${uart}/autosuspend_delay_ms
    echo enabled > ${uart}/wakeup
    echo auto > ${uart}/control
done

# Configure wake-up from suspend
uarts=$(find /sys/class/tty/tty[SO]*/power/ -type d)
for uart in ${uarts}; do
    echo enabled > ${uart}/wakeup
done

echo 1 > /sys/kernel/debug/pm_debug/enable_off_mode

And current instructions (idle.sh output here is wrong, but the script here is fixed):

./setup-idle.sh

Wait a bit, and run sleep 5 ; ./idle.sh

And check if OFF or RET are increasing

# sleep 5; ./idle.sh
ST_MCSPI1
OFF:0,RET:458

Then, to at least turn off the backlight:

# modprobe panel-sony-acx565akm
# echo 0 > /sys/class/backlight/acx565akm/brightness

Which adds UART2 as a blocker, but pm is better

# sleep 5; ./idle.sh
ST_UART2,ST_MCSPI1
OFF:0,RET:806

to get power measurements:

# modprobe bq27xxx_battery
# modprobe bq27xxx_battery_i2c
# modprobe bq2415x_charger

# cat /sys/class/power_supply/bq27200-0/power_avg
89060
MerlijnWajer commented 2 years ago

idle.sh:

blocker_bits=$(cat /sys/kernel/debug/pm_debug/count | grep idlest1 | awk '{print $7}')
#blocker_bits=$(cat /sys/kernel/debug/pm_debug/count | grep idlest | awk '{print $7}')

    blockers=`python3 - $blocker_bits << EOF
import sys

# 31 to 0
cm_idlest1_core_bits = [ 'RESERVED', 'ST_MMC3', 'ST_ICR',
'RESERVED', 'RESERVED', 'RESERVED', 'ST_MMC2', 'ST_MMC1',
'RESERVED', 'ST_HDQ', 'ST_MCSPI4', 'ST_MCSPI3', 'ST_MCSPI2',
'ST_MCSPI1', 'ST_I2C3', 'ST_I2C2', 'ST_I2C1', 'ST_UART2',
'ST_UART1', 'ST_GPT11', 'ST_GPT10', 'ST_MCBSP5', 'ST_MCBSP1',
'RESERVED', 'ST_MAILBOXES', 'ST_OMAPCTRL', 'ST_HSOTGUSB_IDLE',
'ST_HSOTGUSB_STDBY', 'RESERVED', 'ST_SDMA', 'ST_SDRC', 'RESERVED',
]

cm_idlest1_core_bits = list(reversed(cm_idlest1_core_bits))
inp = sys.argv[1]
v = int(inp, 16)
b= '{0:b}'.format(v)
blockers = []
for i in range(0, 32):
    is_set = (v & (1 << i)) >> i
    if is_set:
        blockers.append(cm_idlest1_core_bits[i])
print(','.join(blockers),end='')

EOF`
echo $blockers;

idle=$(grep ^core_pwrdm /sys/kernel/debug/pm_debug/count | cut -d',' -f2,3)
echo $idle;
MerlijnWajer commented 2 years ago

I cannot get drm to disable the display currently, using https://github.com/IMbackK/drm_blankscreen - it reports success it seems, but it doesn't work. This is with panel driver and omapdrm loaded.

MerlijnWajer commented 2 years ago

I will try lowering the target_residency values now as Tony suggests, see if OFF mode is hit. After that, I guess we'll need some way to load each and all of the modules that we normally use, and then see which ones block idle.

EDIT: Lowering the values helped, but it does not make a big difference since the kernel wakes up quite often

MerlijnWajer commented 2 years ago

Blocks idle (most of the time?):

Does not block idle (but can still increase power usage) with no process using it / keeping it open:

MerlijnWajer commented 2 years ago

For the record I can hit OFF mode without any patches in Linux 5.7, at 0.011A @ 3.8V. So there's also a regression since then that prevents it from entering OFF mode without changing the timings.

MerlijnWajer commented 2 years ago

5.9 seems to no longer hit OFF mode for me, I'd have to re-check 5.8 to see if it still works there.

MerlijnWajer commented 2 years ago

With commit fb2c599f056640d289b2147fbe6d9eaee689f1b2 reverted on 5.15.y at least the instability problems are gone.

Another thing, when n900-powermanagement is started, gpio_keys starts acting up and reports spurious events for all gpio keys (1 and then 0).

MerlijnWajer commented 2 years ago

With 5.8.y (stable patches) I can still hit off mode, although it's a bit less stable and I have to do it this way:

mount -t proc none /proc
mount -t sysfs none /sys
mount -t debugfs none /sys/kernel/debug
mount -o rw,remount /

echo 1 > /sys/kernel/debug/pm_debug/enable_off_mode

modprobe panel-sony-acx565akm
echo 0 > /sys/class/backlight/acx565akm/brightness

consoles=$(find /sys/bus/platform/devices/4*.serial/ -name console)
for console in ${consoles}; do
    echo N > ${console}
done

# Enable autosuspend
uarts=$(find /sys/bus/platform/devices/4*.serial/power/ -type d)
for uart in ${uarts}; do
    echo 2000 > ${uart}/autosuspend_delay_ms
    echo enabled > ${uart}/wakeup
    echo auto > ${uart}/control
done

# Configure wake-up from suspend
uarts=$(find /sys/class/tty/tty[SO]*/power/ -type d)
for uart in ${uarts}; do
    echo enabled > ${uart}/wakeup
done
MerlijnWajer commented 2 years ago

During bisect to find out when off mode stopped working between v5.8 and v5.9 I found that the first commit I hit (g47ec5303d73e) works extremely well when it comes to idle behaviour -- in the sense that it can stay in OFF mode for basically a minute or longer.

root@(none):/# grep ^core_pwrdm /sys/kernel/debug/pm_debug/count | cut -d',' -f2,3
OFF:8,RET:3
root@(none):/# grep ^core_pwrdm /sys/kernel/debug/pm_debug/count | cut -d',' -f2,3
OFF:10,RET:3
MerlijnWajer commented 2 years ago

Sent to the lists:

Hi,

I've spent the day bisecting what exact commit prevented the Nokia N900
from entering the OFF sleep state (between v5.8 and v5.9), and it this
commit:

> # first bad commit: [facdaa917c4d5a376d09d25865f5a863f906234a] mm: proactive compaction

The git tree prior to that commit can idle at about ~27mW in OFF mode,
and it will often remain in that mode for prolonged amounts of time
(easily 30 seconds, depending on running userspace). Which the above
commit applied, the Nokia N900 almost never hits OFF mode any more. This
would suggest at least to disable CONFIG_COMPACTION, perhaps in
omap2plus_defconfig? I suspect this might cause idle problems beyond the
Nokia N900, too.

Maybe nothing needs to be done here other than disable the config option
-- but I wanted to share this in case others are trying to figure out
what happened to their battery life. 

There seem be more power regressions since then (at least on 5.15 there
is more blocking proper idle), so I'll try to find those as well, but if
this commit is reverted (or CONFIG_COMPACTION=n is in .config - probably
easier) on top of v5.9 the system seems to idle fine.

> # grep ^core_pwrdm /sys/kernel/debug/pm_debug/count | cut -d',' -f2,
> OFF:16,RET:2

Hope this helps someone...

Regards,
Merlijn

PS: v5.10 seems to use another 19mW if panel_sony_acx565akm is loaded
even when display is not active (maybe it doesn't suspend or something?
- could be fixed later, just noticed it for v5.10). I load it initially
to idle the display, but until I rmmod the modules, the module uses
quite a bit more power. This problem is not present in v5.9, so that is
another thing to chase down I guess... And then v5.15 uses another 12mW
more, for not yet uncovered reasons)
MerlijnWajer commented 2 years ago
Hi Sebastian,

I don't know if this is something that requires any action currently,
but I wanted to report that I'm seeing some increased power draw on a
Nokia N900 with minimal userspace on Linux 5.10 (and the same happens on
5.15 it seems, so it doesn't seem to be resolved since). I tried to
bisect the problem but my initial attempt failed, because the problem
seems a bit racy or unpredictable.

Basically I boot a system to init=/bin/bash and run the following:

> modprobe panel-sony-acx565akm
>
> mount -t proc none /proc
> mount -t sysfs none /sys
> mount -t debugfs none /sys/kernel/debug
> mount -o rw,remount /
>
> echo 1 > /sys/kernel/debug/pm_debug/enable_off_mode
> echo 0 > /sys/class/backlight/acx565akm/brightness
>
>
> consoles=$(find /sys/bus/platform/devices/4*.serial/ -name console)
> for console in ${consoles}; do
>     echo N > ${console}
> done
>
> # Enable autosuspend
> uarts=$(find /sys/bus/platform/devices/4*.serial/power/ -type d)
> for uart in ${uarts}; do
>     echo 2000 > ${uart}/autosuspend_delay_ms
>     echo enabled > ${uart}/wakeup
>     echo auto > ${uart}/control
> done
>
> # Configure wake-up from suspend
> uarts=$(find /sys/class/tty/tty[SO]*/power/ -type d)
> for uart in ${uarts}; do
>     echo enabled > ${uart}/wakeup
> done

This loads the panel and then sets the brightness to zero, enables off
mode and idles the kernel console/serial.

Then run the following to check idle states:

    grep ^core_pwrdm /sys/kernel/debug/pm_debug/count | cut -d',' -f2,3

And also check the power usage on lab power supply that I have here.

With the above, Linux v5.9 (no patches applied) idles at around 42mW
(15mW goes to the serial device, so it's more like 27mW, anyway...).

Linux v5.10 with the following two commits reverted (otherwise the
system is not stable):

* fb2c599f056640d289b2147fbe6d9eaee689f1b2 (ARM: omap3: enable off mode
automatically)
* 21b2cec61c04bd175f0860d9411a472d5a0e7ba1 (mmc: Set
PROBE_PREFER_ASYNCHRONOUS for drivers that existed in v4.4)

And the following config change on top of omap2plus_defconfig (to make
OFF mode work on v5.10 as detailed in "Nokia N900 not hitting OFF mode
since 5.9 is caused by proactive memory compaction"):

> sed -i 's/CONFIG_COMPACTION=y/CONFIG_COMPACTION=n/' .config

Idles at much more -- 60mW (compared to 42mW). Executing "rmmod
panel-sony-acx565akm" makes the power draw return to v5.9 levels.

I don't really understand why this would happen, and as stated before
wasn't able to really bisect the problem. However, some simple guesswork
led me to find that reverting 7c4bada12d320d8648ba3ede6f9b6f9e10f1126a
("drm/panel: sony-acx565akm: Fix race condition in probe") makes v5.10
idle at 42mW again. I don't know if this because v5.9 never properly
initialised the panel, or because the race condition fix introduced
another problem that leaves the hardware in an abnormal state.

Any hints on what could cause this extra power draw? Maybe the panel is
waiting for something? I suppose it's potentially feasible that with
more modules and userspace loaded the panel idles properly, but I
currently don't have a way to measure that.

Regards,
Merlijn

PS: For both v5.9 and v5.10 kernels the only other change to
omap2plus_defconfig is to make the watchdog(s) built-in.
MerlijnWajer commented 2 years ago

For the record for my bisect tests I used this every step for v5.9..v5.10:

v5.10:
git merge-base --is-ancestor 21b2cec61c04bd175f0860d9411a472d5a0e7ba1 HEAD && git cherry-pick f1e1be898042aff9be3e17c6c1e77513b52e4c4d --no-commit
git merge-base --is-ancestor fb2c599f056640d289b2147fbe6d9eaee689f1b2 HEAD && git cherry-pick 3992aa31bffa73683089d86b5fad3315e3c17fcd --no-commit

The commits that get cherry-picked are simply reverts.

For v5.10..v5.11:

git merge-base --is-ancestor 7c4bada12d320d8648ba3ede6f9b6f9e10f1126a HEAD && git cherry-pick 56a6732102e847a3c3b6f40f8594c69a226fd709 --no-commit
git merge-base --is-ancestor fb2c599f056640d289b2147fbe6d9eaee689f1b2 HEAD && git cherry-pick 3992aa31bffa73683089d86b5fad3315e3c17fcd --no-commit
git merge-base --is-ancestor 21b2cec61c04bd175f0860d9411a472d5a0e7ba1 HEAD && git revert 21b2cec61c04bd175f0860d9411a472d5a0e7ba1 --no-commit

All three are reverts.

MerlijnWajer commented 2 years ago

Hi Tony, Adam,

I noticed that after I fixed the OFF mode regression between v5.9 and
v5.10 that there are another one between v5.10 and v5.11. Fortunately,
much like the other change it can be worked around with a config change,
and in fact it looks like the commit identified by git bisect is indeed
just a commit to change omap2plus_defconfig.

a82820fcd079e38309403f595f005a8cc318a13c ("ARM: omap2plus_defconfig:
Enable OMAP3_THERMAL") prevents the N900 from entering OFF mode pretty
much all the time (I've seen scenarios with OFF:2,RET:500), but with the
config change reverted, stuff like this is more common: OFF:13,RET:2

We will probably to keep the thermal features enabled, but maybe we can
figure out why it causes the SoC to not enter sleep modes?

The good news is that this seems to be one of the last regressions with
regards to OFF mode (there might be smaller ones that cause slightly
more wakeups, but those will be harder to find). With this
(CONFIG_OMAP3_THERMAL) config option disabled as well; as fixes from my
other recent emails I can get my 5.15 branch to enter OFF mode again:

> # uname -a
> Linux (none) 5.15.2-00597-g68be8fac7cbd #48 SMP PREEMPT Sat Dec 11 00:14:05 CET 2021 armv7l GNU/Linux
> # grep ^core_pwrdm /sys/kernel/debug/pm_debug/count | cut -d',' -f2,
> OFF:13,RET:10

Regards,
Merlijn
MerlijnWajer commented 2 years ago

So we can hit OFF mode now on 5.15 with some patches. Not in full GUI mode, yet. There are also some stability problems when idling. I think once the stability problems we can start looking at idling individual subsystems?

MerlijnWajer commented 2 years ago

Something that would be interesting to do regarding OFF mode tests is to parse lsmod on a running GUI system and then boot to init=/bin/bash and insert the modules, one at a time, sleeping in between, and figuring out which ones block off/ret.

gordon-quad commented 4 months ago

Any updates?