sonyxperiadev / bug_tracker

Empty repository that is used as a bugtracker for Open Devices project
52 stars 13 forks source link

[Tama][AOSP10][4.14] Using schedutil can lead to a crash #705

Closed rinigus closed 1 year ago

rinigus commented 3 years ago

Platform: Tama Device: Akari Kernel version: 4.14.232-gd88c66b3138a Android version: android-10.0.0_r41 - Sailfish 4.1.0.24 Software binaries version: SW_binaries_for_Xperia_Android_10.0.7.1_r1_v12a_tama

Previously working on

No idea

Description

First observed on SFOS port, but have been able to reproduce on AOSP10 as well. Looks like crash is reproducible for

When switching /sys/devices/system/cpu/cpufreq/policy0 to schedutil, phone sometimes crashes. It is almost always on SFOS, on AOSP10 I managed to hit it once. It does look to correlate with dmesg message

cpufreq: CPU0: Fast frequency switching not enabled

When fast frequency is not enabled, phone stays stable. Without it, it crashes.

How to reproduce

# in adb shell as superuser
cd /sys/devices/system/cpu/cpufreq/policy0
echo performance > scaling_governor
echo schedutil > scaling_governor
# will crash here, if not check if fast frequency switching is disabled
dmesg | tail 

Crash logs

From AOSP pstore after crash:

[ 9521.243481] ------------[ cut here ]------------
[ 9521.243665] WARNING: CPU: 0 PID: 5320 at /awork/android/R_MR1/kernel/sony/msm-4.14/kernel/lib/debugobjects.c:330 __debug_object_init+0x32c/0x570
[ 9521.244605] ---[ end trace 640eea892c8c164c ]---
[ 9521.244859] BUG: scheduling while atomic: sh/5320/0x00010003

Bit more in SFOS journal:

Jul 03 14:58:46 XperiaXZ2Compact kernel: BUG: scheduling while atomic: gdbus/6407/0x00000005
Jul 03 14:58:46 XperiaXZ2Compact kernel: Modules linked in:
Jul 03 14:58:46 XperiaXZ2Compact kernel: CPU: 2 PID: 6407 Comm: gdbus Tainted: G        W       4.14.232-gd88c66b3138a-dirty #2
Jul 03 14:58:46 XperiaXZ2Compact kernel: Hardware name: Sony Mobile Communications. Apollo(SDM845 v2.1) (DT)
Jul 03 14:58:46 XperiaXZ2Compact kernel: Call trace:
Jul 03 14:58:46 XperiaXZ2Compact kernel:  dump_backtrace+0x0/0x188
Jul 03 14:58:46 XperiaXZ2Compact kernel:  show_stack+0x14/0x1c
Jul 03 14:58:46 XperiaXZ2Compact kernel:  dump_stack+0xcc/0x10c
Jul 03 14:58:46 XperiaXZ2Compact kernel:  __schedule_bug+0x50/0x70
Jul 03 14:58:46 XperiaXZ2Compact kernel:  __schedule+0x9b8/0xca0
Jul 03 14:58:46 XperiaXZ2Compact kernel:  schedule+0x70/0x90
Jul 03 14:58:46 XperiaXZ2Compact kernel:  schedule_timeout+0x388/0x44c
Jul 03 14:58:46 XperiaXZ2Compact kernel:  wait_for_common+0xa4/0x114
Jul 03 14:58:46 XperiaXZ2Compact kernel:  wait_for_completion_timeout+0x10/0x18
Jul 03 14:58:46 XperiaXZ2Compact kernel:  rpmh_write+0x160/0x204
Jul 03 14:58:46 XperiaXZ2Compact kernel:  rpmh_regulator_send_aggregate_requests+0x3e8/0x4ec
Jul 03 14:58:46 XperiaXZ2Compact kernel:  rpmh_regulator_arc_set_voltage_sel+0x44/0x9c
Jul 03 14:58:46 XperiaXZ2Compact kernel:  _regulator_do_set_voltage+0x2a4/0x5e4
Jul 03 14:58:46 XperiaXZ2Compact kernel:  regulator_set_voltage_unlocked+0x204/0x2c8
Jul 03 14:58:46 XperiaXZ2Compact kernel:  regulator_set_voltage+0x4c/0x80
Jul 03 14:58:46 XperiaXZ2Compact kernel:  clk_update_vdd+0x94/0x1b0
Jul 03 14:58:46 XperiaXZ2Compact kernel:  clk_calc_subtree+0x1d0/0x2d0
Jul 03 14:58:46 XperiaXZ2Compact kernel:  clk_calc_new_rates+0x384/0x3f0
Jul 03 14:58:46 XperiaXZ2Compact kernel:  clk_calc_new_rates+0x3e0/0x3f0
Jul 03 14:58:46 XperiaXZ2Compact kernel:  clk_core_set_rate_nolock+0x98/0x410
Jul 03 14:58:46 XperiaXZ2Compact kernel:  clk_set_rate+0x84/0x110
Jul 03 14:58:46 XperiaXZ2Compact kernel:  osm_cpufreq_target_index+0xa4/0xd4
Jul 03 14:58:46 XperiaXZ2Compact kernel:  osm_cpufreq_fast_switch+0xe4/0x11c
Jul 03 14:58:46 XperiaXZ2Compact kernel:  cpufreq_driver_fast_switch+0x34/0x58
Jul 03 14:58:46 XperiaXZ2Compact kernel:  sugov_update_commit+0x78/0x194
Jul 03 14:58:46 XperiaXZ2Compact kernel:  sugov_update_shared+0x3e0/0x434
Jul 03 14:58:46 XperiaXZ2Compact kernel:  attach_entity_load_avg+0xb8/0x180
Jul 03 14:58:46 XperiaXZ2Compact kernel:  enqueue_task_fair+0xf8c/0x266c
Jul 03 14:58:46 XperiaXZ2Compact kernel:  ttwu_do_activate+0xc0/0x244
Jul 03 14:58:46 XperiaXZ2Compact kernel:  try_to_wake_up+0x430/0x610
Jul 03 14:58:46 XperiaXZ2Compact kernel:  default_wake_function+0x14/0x1c
Jul 03 14:58:46 XperiaXZ2Compact kernel:  pollwake+0x4c/0x60
Jul 03 14:58:46 XperiaXZ2Compact kernel:  __wake_up_locked_key+0x50/0x7c
Jul 03 14:58:46 XperiaXZ2Compact kernel:  eventfd_write+0x1c0/0x210
Jul 03 14:58:46 XperiaXZ2Compact kernel:  __vfs_write+0x38/0x110
Jul 03 14:58:46 XperiaXZ2Compact kernel:  vfs_write+0xc8/0x184
Jul 03 14:58:46 XperiaXZ2Compact kernel:  SyS_write+0x60/0xa8
Jul 03 14:58:46 XperiaXZ2Compact kernel:  el0_svc_naked+0x34/0x38
Jul 03 14:58:46 XperiaXZ2Compact kernel: BUG: scheduling while atomic: gdbus/6407/0xfffffffd
Jul 03 14:58:46 XperiaXZ2Compact kernel: Modules linked in:
Jul 03 14:58:46 XperiaXZ2Compact kernel: CPU: 2 PID: 6407 Comm: gdbus Tainted: G        W       4.14.232-gd88c66b3138a-dirty #2
Jul 03 14:58:46 XperiaXZ2Compact kernel: Hardware name: Sony Mobile Communications. Apollo(SDM845 v2.1) (DT)
Jul 03 14:58:46 XperiaXZ2Compact kernel: Call trace:
Jul 03 14:58:46 XperiaXZ2Compact kernel:  dump_backtrace+0x0/0x188
Jul 03 14:58:46 XperiaXZ2Compact kernel:  show_stack+0x14/0x1c
Jul 03 14:58:46 XperiaXZ2Compact kernel:  dump_stack+0xcc/0x10c
Jul 03 14:58:46 XperiaXZ2Compact kernel:  __schedule_bug+0x50/0x70
Jul 03 14:58:46 XperiaXZ2Compact kernel:  __schedule+0x9b8/0xca0
Jul 03 14:58:46 XperiaXZ2Compact kernel:  schedule+0x70/0x90
Jul 03 14:58:46 XperiaXZ2Compact kernel:  schedule_hrtimeout_range_clock+0x78/0x188
Jul 03 14:58:46 XperiaXZ2Compact kernel:  schedule_hrtimeout_range+0x10/0x18
Jul 03 14:58:46 XperiaXZ2Compact kernel:  do_sys_poll+0x250/0x5b8
Jul 03 14:58:46 XperiaXZ2Compact kernel:  SyS_ppoll+0x168/0x240
Jul 03 14:58:46 XperiaXZ2Compact kernel:  el0_svc_naked+0x34/0x38
rinigus commented 3 years ago

I wonder whether it is possible to disable fast frequency switching for power-efficient CPUs.

rinigus commented 3 years ago

I was able to trigger crashes with 100% "success" rates on AOSP as follows:

Test 1

In this case, fast switching is disabled on policy0 CPUs, but not on policy4 (observing dmesg).

Example of pstore record after crash using AOSP10:

[  128.106507] water_detection soc:somc_water_detection: wdet_check_water_work:set powerrole fail
[  128.425742] dsi-ctrl:[dsi_ctrl_handle_error_status] tx timeout error: 0x40
[  130.226086] BUG: scheduling while atomic: kworker/3:2/856/0x00000004
[  130.226402] ------------[ cut here ]------------
[  130.226528] WARNING: CPU: 3 PID: 856 at /awork/android/R_MR1/kernel/sony/msm-4.14/kernel/kernel/rcu/tree_plugin.h:329 rcu_note_context_switch+0x468/0x50c
[  130.226894] ---[ end trace 417e3f5e72d734d7 ]---

Test 2

Crash can be triggered by first switching to schedutil on policy4 and then policy0. In this case, dmesg points to disabling fast switching on policy4, but not policy0

Conclusion

As is, schedutil cannot be used on AOSP10

jerpelea commented 1 year ago

Discontinued Android version