sonyxperiadev / bug_tracker

Empty repository that is used as a bugtracker for Open Devices project
52 stars 13 forks source link

android.hardware.health@2.1 fails to catch an exception #850

Open rinigus opened 3 weeks ago

rinigus commented 3 weeks ago

Platform: nagara Device: pdx223 Kernel version: 5.15 Android version: android-14_r67 Software binaries version: Patched SW_binaries_for_Xperia_Android_13_5.15_v4a_nagara.zip

Previously working on

Same issue was for me on AOSP13 as well, recent builds.

Description

On boot, device is stuck on ANDROID logo. In logcat, there are failures with android.hardware.health@2.1:

01-01 00:00:06.314   478   478 I hwservicemanager: Since android.hardware.health@2.1::IHealth/default is not registered, trying to start it as a lazy HAL.
01-01 00:00:06.311   720   720 I android.hardwar: type=1400 audit(0.0:534): avc: denied { read } for name="qcom,qseecom" dev="tmpfs" ino=548 scontext=u:r:hal_gatekeeper_default:s0 tcontext=u:object_r:dmabuf_heap_device:s0 tclass=chr_file permissive=1
01-01 00:00:06.311   720   720 I android.hardwar: type=1400 audit(0.0:535): avc: denied { open } for path="/dev/dma_heap/qcom,qseecom" dev="tmpfs" ino=548 scontext=u:r:hal_gatekeeper_default:s0 tcontext=u:object_r:dmabuf_heap_device:s0 tclass=chr_file permissive=1
01-01 00:00:06.315   720   720 I android.hardwar: type=1400 audit(0.0:536): avc: denied { ioctl } for path="/dev/dma_heap/qcom,qseecom" dev="tmpfs" ino=548 ioctlcmd=0x4800 scontext=u:r:hal_gatekeeper_default:s0 tcontext=u:object_r:dmabuf_heap_device:s0 tclass=chr_file permissive=1
01-01 00:00:06.315   720   720 D QSEECOMAPI: QSEECom_get_handle sb_length = 0xa000
01-01 00:00:06.315   720   720 D QSEECOMAPI: App is already loaded QSEE and app id = 1
01-01 00:00:06.316   720   720 I DMABUFHEAPS: Using DMA-BUF heap named: qcom,qseecom
01-01 00:00:06.312     0     0 I         : wlan_hdd_state wlan major(474) initialized
01-01 00:00:06.317     0     0 I init    : processing action (boot) from (/vendor/etc/init/init.usb.rc:41)
01-01 00:00:06.317     0     0 I init    : Command 'mount configfs none /config' action=boot (/vendor/etc/init/init.usb.rc:42) took 0ms and failed: mount() failed: Device or resource busy
01-01 00:00:06.317   478   478 I hwservicemanager: Since android.hardware.health@2.1::IHealth/default is not registered, trying to start it as a lazy HAL.
01-01 00:00:06.317   721   788 I HidlServiceManagement: getService: Trying again for android.hardware.health@2.1::IHealth/default...
01-01 00:00:06.318   721   721 I HidlServiceManagement: Registered android.hardware.gnss@2.1::IGnss/default
01-01 00:00:06.318   478   790 W libc    : Unable to set property "ctl.interface_start" to "android.hardware.health@2.1::IHealth/default": error code: 0x20
01-01 00:00:06.318   721   721 I HidlServiceManagement: Removing namespace from process name android.hardware.gnss@2.1-service-qti to gnss@2.1-service-qti.
01-01 00:00:06.318   478   790 I hwservicemanager: Tried to start android.hardware.health@2.1::IHealth/default as a lazy service, but was unable to. Usually this happens when a service is not installed, but if the service is intended to be used as a lazy service, then it may be configured incorrectly.
01-01 00:00:06.318   721   721 I LegacySupport: Registration complete for android.hardware.gnss@2.1::IGnss/default.
01-01 00:00:06.319   721   721 I android.hardware.gnss@2.1-service-qti: LOC_HIDL_VERSION not defined.
01-01 00:00:06.319     0     0 E init    : Control message: Could not find 'android.hardware.health@2.1::IHealth/default' for ctl.interface_start from pid: 478 (/system/bin/hwservicemanager)
01-01 00:00:06.320   720   720 E KeymasterUtils: TA API Major Version: 4
01-01 00:00:06.320   720   720 E KeymasterUtils: TA API Minor Version: 1
01-01 00:00:06.320   720   720 E KeymasterUtils: TA Major Version: 4
01-01 00:00:06.320   720   720 E KeymasterUtils: TA Minor Version: 1012
01-01 00:00:06.321   709   709 I AndroidRuntime: Using default boot image
01-01 00:00:06.321   709   709 I AndroidRuntime: Leaving lock profiling enabled
01-01 00:00:06.322     0     0 E init    : Control message: Could not find 'android.hardware.health@2.1::IHealth/default' for ctl.interface_start from pid: 478 (/system/bin/hwservicemanager)
01-01 00:00:06.323     0     0 I logd    : logdr: UID=1000 GID=1000 PID=776 n tail=0 logMask=8 pid=722 start=0ns deadline=0ns
01-01 00:00:06.322   478   793 W libc    : Unable to set property "ctl.interface_start" to "android.hardware.health@2.1::IHealth/default": error code: 0x20
01-01 00:00:06.324     0     0 I logd    : logdr: UID=1000 GID=1000 PID=776 n tail=0 logMask=1 pid=722 start=0ns deadline=0ns
01-01 00:00:06.324   478   793 I hwservicemanager: Tried to start android.hardware.health@2.1::IHealth/default as a lazy service, but was unable to. Usually this happens when a service is not installed, but if the service is intended to be used as a lazy service, then it may be configured incorrectly.
01-01 00:00:06.325   776   776 F DEBUG   : *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
01-01 00:00:06.325   776   776 F DEBUG   : Build fingerprint: 'Sony/aosp_xqct54/pdx223:13/TQ3A.230901.001/eng.root.20241031.073518:userdebug/test-keys'
01-01 00:00:06.325   776   776 F DEBUG   : Revision: '0'
01-01 00:00:06.325   776   776 F DEBUG   : ABI: 'arm64'
01-01 00:00:06.325   776   776 F DEBUG   : Timestamp: 1970-01-01 00:00:06.309160049+0000
01-01 00:00:06.325   776   776 F DEBUG   : Process uptime: 0s
01-01 00:00:06.325   776   776 F DEBUG   : Cmdline: /vendor/bin/hw/android.hardware.health@2.1-service
01-01 00:00:06.325   776   776 F DEBUG   : pid: 722, tid: 722, name: android.hardwar  >>> /vendor/bin/hw/android.hardware.health@2.1-service <<<
01-01 00:00:06.325   776   776 F DEBUG   : uid: 1000
01-01 00:00:06.325   776   776 F DEBUG   : tagged_addr_ctrl: 0000000000000001 (PR_TAGGED_ADDR_ENABLE)
01-01 00:00:06.325   776   776 F DEBUG   : pac_enabled_keys: 000000000000000f (PR_PAC_APIAKEY, PR_PAC_APIBKEY, PR_PAC_APDAKEY, PR_PAC_APDBKEY)
01-01 00:00:06.325   776   776 F DEBUG   : signal 6 (SIGABRT), code -1 (SI_QUEUE), fault addr --------
01-01 00:00:06.325   776   776 F DEBUG   : Abort message: 'terminating with uncaught exception of type St16invalid_argument: stoi: no conversion'
01-01 00:00:06.325   776   776 F DEBUG   :     x0  0000000000000000  x1  00000000000002d2  x2  0000000000000006  x3  0000007fddfe9e70
01-01 00:00:06.325   776   776 F DEBUG   :     x4  736f646277641f73  x5  736f646277641f73  x6  736f646277641f73  x7  7f7f7f7f7f7f7f7f
01-01 00:00:06.325   776   776 F DEBUG   :     x8  00000000000000f0  x9  000000700b9d1a00  x10 0000000000000001  x11 000000700ba0fde4
01-01 00:00:06.325   776   776 F DEBUG   :     x12 0000000000000006  x13 000000007fffffff  x14 0000000000000baa  x15 0000000013b75690
01-01 00:00:06.325   776   776 F DEBUG   :     x16 000000700ba74d58  x17 000000700ba51c70  x18 000000700d8e8000  x19 00000000000002d2
01-01 00:00:06.325   776   776 F DEBUG   :     x20 00000000000002d2  x21 00000000ffffffff  x22 0000007fddfe9fa0  x23 ffffff80ffffffc8
01-01 00:00:06.325   776   776 F DEBUG   :     x24 000000700ba76f70  x25 0000000000000001  x26 b400006e1b5ec9a0  x27 b400006e4b5f98c8
01-01 00:00:06.325   776   776 F DEBUG   :     x28 b400006e1b5ec9a0  x29 0000007fddfe9ef0
01-01 00:00:06.325   776   776 F DEBUG   :     lr  000000700ba01968  sp  0000007fddfe9e50  pc  000000700ba01994  pst 0000000000001000
01-01 00:00:06.325   776   776 F DEBUG   : backtrace:
01-01 00:00:06.325   776   776 F DEBUG   :       #00 pc 0000000000051994  /apex/com.android.runtime/lib64/bionic/libc.so (abort+164) (BuildId: 2c82bf49529e9aa25cb9e5eca777a56f)
01-01 00:00:06.326   776   776 F DEBUG   :       #01 pc 000000000004910c  /apex/com.android.vndk.v33/lib64/libc++.so (abort_message+248) (BuildId: ac7b79d1f77424245e84c9e5e12121c7)
01-01 00:00:06.326   776   776 F DEBUG   :       #02 pc 00000000000492d4  /apex/com.android.vndk.v33/lib64/libc++.so (demangling_terminate_handler()+208) (BuildId: ac7b79d1f77424245e84c9e5e12121c7)
01-01 00:00:06.326   776   776 F DEBUG   :       #03 pc 0000000000049e84  /apex/com.android.vndk.v33/lib64/libc++.so (std::__terminate(void (*)())+12) (BuildId: ac7b79d1f77424245e84c9e5e12121c7)
01-01 00:00:06.326   776   776 F DEBUG   :       #04 pc 00000000000494ec  /apex/com.android.vndk.v33/lib64/libc++.so (__cxxabiv1::failed_throw(__cxxabiv1::__cxa_exception*)+28) (BuildId: ac7b79d1f77424245e84c9e5e12121c7)
01-01 00:00:06.326   776   776 F DEBUG   :       #05 pc 0000000000049450  /apex/com.android.vndk.v33/lib64/libc++.so (__cxa_throw+116) (BuildId: ac7b79d1f77424245e84c9e5e12121c7)
01-01 00:00:06.326   776   776 F DEBUG   :       #06 pc 000000000009ae94  /apex/com.android.vndk.v33/lib64/libc++.so (std::__1::stoi(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, unsigned long*, int)+384) (BuildId: ac7b79d1f77424245e84c9e5e12121c7)
01-01 00:00:06.326   776   776 F DEBUG   :       #07 pc 000000000000ac38  /vendor/lib64/hw/android.hardware.health@2.0-impl-2.1-sony.so (device::sony::health::CycleCountBackupRestore::Read(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, int&)+148) (BuildId: 47284e398f31910ce682e251b7e79c63)
01-01 00:00:06.326   776   776 F DEBUG   :       #08 pc 000000000000ab4c  /vendor/lib64/hw/android.hardware.health@2.0-impl-2.1-sony.so (device::sony::health::CycleCountBackupRestore::Restore()+36) (BuildId: 47284e398f31910ce682e251b7e79c63)
01-01 00:00:06.326   776   776 F DEBUG   :       #09 pc 000000000000a1c0  /vendor/lib64/hw/android.hardware.health@2.0-impl-2.1-sony.so (device::sony::health::SonyHealth::SonyHealth(std::__1::unique_ptr<healthd_config, std::__1::default_delete<healthd_config> >&&)+116) (BuildId: 47284e398f31910ce682e251b7e79c63)
01-01 00:00:06.326   776   776 F DEBUG   :       #10 pc 000000000000a0e0  /vendor/lib64/hw/android.hardware.health@2.0-impl-2.1-sony.so (HIDL_FETCH_IHealth+204) (BuildId: 47284e398f31910ce682e251b7e79c63)
01-01 00:00:06.326   776   776 F DEBUG   :       #11 pc 0000000000052554  /apex/com.android.vndk.v33/lib64/libhidlbase.so (android::hardware::PassthroughServiceManager::get(android::hardware::hidl_string const&, android::hardware::hidl_string const&)::'lambda'(void*, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&)::operator()(void*, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&) const+92) (BuildId: 009bd5d0670c92352e8223126010f84e)
01-01 00:00:06.326   776   776 F DEBUG   :       #12 pc 000000000004d70c  /apex/com.android.vndk.v33/lib64/libhidlbase.so (android::hardware::PassthroughServiceManager::openLibs(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::function<bool (void*, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&)> const&)+1372) (BuildId: 009bd5d0670c92352e8223126010f84e)
01-01 00:00:06.326   776   776 F DEBUG   :       #13 pc 0000000000050970  /apex/com.android.vndk.v33/lib64/libhidlbase.so (android::hardware::PassthroughServiceManager::get(android::hardware::hidl_string const&, android::hardware::hidl_string const&)+92) (BuildId: 009bd5d0670c92352e8223126010f84e)
01-01 00:00:06.326   776   776 F DEBUG   :       #14 pc 000000000004e770  /apex/com.android.vndk.v33/lib64/libhidlbase.so (android::hardware::details::getRawServiceInternal(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, bool, bool)+1608) (BuildId: 009bd5d0670c92352e8223126010f84e)
01-01 00:00:06.326   776   776 F DEBUG   :       #15 pc 0000000000018b9c  /vendor/lib64/android.hardware.health@2.0.so (android::sp<android::hardware::health::V2_0::IHealth> android::hardware::details::getServiceInternal<android::hardware::health::V2_0::BpHwHealth, android::hardware::health::V2_0::IHealth, void, void>(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, bool, bool)+208) (BuildId: 0bcba204447d37ac45f9cf9e48429db2)
01-01 00:00:06.326   776   776 F DEBUG   :       #16 pc 000000000000631c  /vendor/bin/hw/android.hardware.health@2.1-service (android::hardware::health::V2_0::IHealth::getService(char const*, bool)+188) (BuildId: 8b1c18b1dc0760c9d0c123f01bd46e1d)
01-01 00:00:06.326   776   776 F DEBUG   :       #17 pc 0000000000006080  /vendor/bin/hw/android.hardware.health@2.1-service (main+44) (BuildId: 8b1c18b1dc0760c9d0c123f01bd46e1d)
01-01 00:00:06.326   776   776 F DEBUG   :       #18 pc 000000000004a1f4  /apex/com.android.runtime/lib64/bionic/libc.so (__libc_init+96) (BuildId: 2c82bf49529e9aa25cb9e5eca777a56f)
01-01 00:00:06.334     0     0 I         : gsid (779) used greatest stack depth: 9840 bytes left
01-01 00:00:06.334     0     0 I init    : Service 'exec 18 (/system/bin/gsid run-startup-tasks)' (pid 779) exited with status 0 oneshot service took 0.028000 seconds in background
01-01 00:00:06.334     0     0 I init    : Sending signal 9 to service 'exec 18 (/system/bin/gsid run-startup-tasks)' (pid 779) process group...
01-01 00:00:06.335     0     0 I libprocessgroup: Successfully killed process cgroup uid 0 pid 779 in 0ms
01-01 00:00:06.335     0     0 I init    : Service 'health-hal-2-1' (pid 722) received signal 6
01-01 00:00:06.335     0     0 I init    : Sending signal 9 to service 'health-hal-2-1' (pid 722) process group...

Symptoms

Stuck on Android logo while booting

How to reproduce

New build

Additional context

Maybe important: last stock was XQ-CT54_Customized EU-UK_64.2.A.2.185

While maybe it cannot reach battery data, the code seems to have corresponding exception handling in https://github.com/sonyxperiadev/device-sony-common/blob/master/hardware/health/CycleCountBackupRestore.cpp#L92

I could make it work by adding

diff --git a/hardware/health/CycleCountBackupRestore.cpp b/hardware/health/CycleCountBackupRestore.cpp
index 2207f60..cd305c6 100644
--- a/hardware/health/CycleCountBackupRestore.cpp
+++ b/hardware/health/CycleCountBackupRestore.cpp
@@ -92,6 +92,9 @@ void CycleCountBackupRestore::Read(const std::string &path, int &cycles) {
     } catch (std::invalid_argument &e) {
         LOG(WARNING) << "Data format is wrong in persist storage file: " << path;
         return;
+    } catch (...) {
+        LOG(WARNING) << "Data format is wrong in persist storage file and exception sneaked through: " << path;
+        return;
     }
     LOG(VERBOSE) << "Read " << cycles << " battery cycles from " << path;
 }

But I cannot figure out why the original exception does not catch it.

cc: @MarijnS95

MarijnS95 commented 3 weeks ago

@rinigus that should have exactly nothing to do with what you call a "Related PR" unless the old code in question was broken such that .Restore() never got called before.

Scrolling the error message top-to-bottom and left-to-right with all unrelated lines is incredibly tedious and frustrating, so I'm cutting it down to the bare minimum that is important:

Abort message: 'terminating with uncaught exception of type St16invalid_argument: stoi: no conversion'
...
backtrace:
      #00 pc 0000000000051994  /apex/com.android.runtime/lib64/bionic/libc.so (abort+164) (BuildId: 2c82bf49529e9aa25cb9e5eca777a56f)
      #01 pc 000000000004910c  /apex/com.android.vndk.v33/lib64/libc++.so (abort_message+248) (BuildId: ac7b79d1f77424245e84c9e5e12121c7)
      #02 pc 00000000000492d4  /apex/com.android.vndk.v33/lib64/libc++.so (demangling_terminate_handler()+208) (BuildId: ac7b79d1f77424245e84c9e5e12121c7)
      #03 pc 0000000000049e84  /apex/com.android.vndk.v33/lib64/libc++.so (std::__terminate(void (*)())+12) (BuildId: ac7b79d1f77424245e84c9e5e12121c7)
      #04 pc 00000000000494ec  /apex/com.android.vndk.v33/lib64/libc++.so (__cxxabiv1::failed_throw(__cxxabiv1::__cxa_exception*)+28) (BuildId: ac7b79d1f77424245e84c9e5e12121c7)
      #05 pc 0000000000049450  /apex/com.android.vndk.v33/lib64/libc++.so (__cxa_throw+116) (BuildId: ac7b79d1f77424245e84c9e5e12121c7)
      #06 pc 000000000009ae94  /apex/com.android.vndk.v33/lib64/libc++.so (std::__1::stoi(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, unsigned long*, int)+384) (BuildId: ac7b79d1f77424245e84c9e5e12121c7)
      #07 pc 000000000000ac38  /vendor/lib64/hw/android.hardware.health@2.0-impl-2.1-sony.so (device::sony::health::CycleCountBackupRestore::Read(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, int&)+148) (BuildId: 47284e398f31910ce682e251b7e79c63)
      #08 pc 000000000000ab4c  /vendor/lib64/hw/android.hardware.health@2.0-impl-2.1-sony.so (device::sony::health::CycleCountBackupRestore::Restore()+36) (BuildId: 47284e398f31910ce682e251b7e79c63)
      #09 pc 000000000000a1c0  /vendor/lib64/hw/android.hardware.health@2.0-impl-2.1-sony.so (device::sony::health::SonyHealth::SonyHealth(std::__1::unique_ptr<healthd_config, std::__1::default_delete<healthd_config> >&&)+116) (BuildId: 47284e398f31910ce682e251b7e79c63)
      #10 pc 000000000000a0e0  /vendor/lib64/hw/android.hardware.health@2.0-impl-2.1-sony.so (HIDL_FETCH_IHealth+204) (BuildId: 47284e398f31910ce682e251b7e79c63)

What, strangely, seems to happen is that std::invalid_argument that the health HAL is compiled with might be different than the std::invalid_argument that libc++.so is raising here (IIRC as defined at https://cs.android.com/android-llvm/toolchain/llvm-project/+/main:libcxx/src/string.cpp;l=62-64;drc=9783f28cbb155e4a8d49c12e1c60ce14dcfaf0c7).

Alternatively AOSP could be compiling our HAL without RTTI (https://stackoverflow.com/questions/74441543/why-stdinvalid-argument-is-not-caught-with-no-rtti-in-macos-m1-environment)? Again nothing that the "Related PR" changed?

bartcubbins commented 3 weeks ago

I see it for the first time. On Nagara Health may not receive a value until ADSP is initialized, but then it restarts and everything goes smoothly. This is Nagara A14r67. I'll try one more time on a fresh build later

01-01 00:00:04.449   783   783 W android.hardware.health@2.1-service: Failed to read battery cycles from /mnt/vendor/persist/battery/battery_cycle_count
01-01 00:00:04.449   783   783 W android.hardware.health@2.1-service: Failed to read battery cycles from /sys/class/power_supply/battery/cycle_count
01-01 00:00:04.449   783   783 W android.hardware.health@2.1-service: Could not read battery capacity persist file from /mnt/vendor/persist/battery/battery_charge_full: No such file or directory
01-01 00:00:04.449   783   783 W android.hardware.health@2.1-service: Read max battery capacity from sysfs error: No such file or directory
01-01 00:00:04.450   783   783 I HidlServiceManagement: Registered android.hardware.health@2.1::IHealth/default
01-01 00:00:04.450   783   783 I HidlServiceManagement: Removing namespace from process name android.hardware.health@2.1-service to health@2.1-service.
01-01 00:00:04.450   783   783 I android.hardware.health@2.1-service: default: Hal init done
---------
01-01 00:00:09.437  1346  1346 W android.hardware.health@2.1-service: Could not read battery capacity persist file from /mnt/vendor/persist/battery/battery_charge_full: No such file or directory
01-01 00:00:09.437  1346  1346 I android.hardware.health@2.1-service: Saved learned max battery capacity of 5056 mAh to persist storage
01-01 00:00:09.437  1346  1346 I android.hardware.health@2.1-service: default instance initializing with healthd_config...
01-01 00:00:09.438  1346  1346 I HidlServiceManagement: Registered android.hardware.health@2.1::IHealth/default
01-01 00:00:09.438  1346  1346 I HidlServiceManagement: Removing namespace from process name android.hardware.health@2.1-service to health@2.1-service.
01-01 00:00:09.438  1346  1346 I android.hardware.health@2.1-service: default: Hal init done
bartcubbins commented 3 weeks ago

@rinigus can you provide me your /vendor/bin/hw/android.hardware.health@2.1-service binary?

rinigus commented 3 weeks ago

@MarijnS95 : I will check PR closer and will trim logs more next time :) . Dropped "Related PR" from original message.

@bartcubbins: let me bake a new one and check that it fails the same way

MarijnS95 commented 3 weeks ago

@bartcubbins despite the broken log messages in CycleCountBackupRestore::Read(), this function is used to read from both persist and the actual sysfs file.

@rinigus said he came from a fresh stock, so it's quite possible that they have the file there in a different format; unfortunately the catch(...) workaround probably caused the HAL to already overwrite it. Or the file was simply corrupt?

Also note that LearnedCapcityBackupRestore strangely uses std::stoi() for ReadFromPersistSTorage(), and sscanf(..., "%d", ...) for ReadFromSRAM() (which is /sys/class/power_supply/bms/charge_full).

rinigus commented 3 weeks ago

Very strange, I don't have that issue anymore. Looks like after successful boot by AOSP and shutdown, it doesn't appear. It fails on first boot after you come from stock as in:

01-01 00:00:06.708   792   792 W android.hardware.health@2.1-service: Data format is wrong in persist storage file and exception sneaked through: /mnt/vendor/persist/battery/battery_cycle_count

That's the first successful boot after coming from stock 64.2.A.2.185 and after applying the patch from my original message.

Before that was the boot where I was stuck on on "android" and exception issue as in the original message.

After the first successful boot, I get logs similar to @bartcubbins. Although, have lots of errors related to selinux, but that's another general issue.

@MarijnS95 - cross-posting, was looking for binary and I agree with you.

So, it is hard to reproduce. We need to probably go back to stock and then return. However, issue with exception catching is still there. @MarijnS95 - I wonder how to check your idea regarding different compilation?

@bartcubbins: File attached

android.hardware.health@2.1-service.gz

MarijnS95 commented 3 weeks ago

@rinigus without fixing this, I'd love to learn more about the downstream format. The entire goal of this HAL is to "persist" these states, but that's all thrown out of the window if we restart the counter from stock, and stock restarts it when coming back from SODP.

Don't think I understand what you mean exactly with cross-posting :sweat_smile:

rinigus commented 3 weeks ago

Re stock / aosp restart. Agree, if we can figure out the format. But I am not sure it is important in practice for many. I would expect majority of users sticking to one of the camps. For SFOS users, they get a device, flash it, and use on SFOS without going to stock.

In my eyes, transition issues between stock and AOSP are minor / low priority. Right now, on nagara, basically nothing works and there are way bigger issues to fix :)

Re cross-posting: wrote my message part regarding coming from fresh stock, started looking for file, and then saw your message.

bartcubbins commented 3 weeks ago

despite the broken log messages in CycleCountBackupRestore::Read(), this function is used to read from both persist and the actual sysfs file.

Yes, I am familiar with the code

bartcubbins commented 2 weeks ago

I'd love to learn more about the downstream format

XQ-CQ54:/ # getprop ro.build.id                                                                                                                                
64.2.A.2.215B
XQ-CQ54:/ # cat /mnt/vendor/persist/battery/battery_charge_full                                                                                                
5056000XQ-CQ54:/ # 

and battery_cycle_count just don't exist

rinigus commented 2 weeks ago

What is weird that it manages to pass android::base::ReadFileToString(path, &buffer) above that in this case. But OK, should we just replace it with sscanf and avoid C++ exceptions?

MarijnS95 commented 2 weeks ago

Did/does the file exist for you, @rinigus (if you come directly from stock)? If the file didn't exist that function shouldn't have passed (unless it returned an empty string in this case?).

rinigus commented 2 weeks ago

In short, I cannot trigger it when testing today.

I flashed back stock, booted into it for setup and once more to confirm it was OK. As its not rooted, couldn't see any vendor/persist. Then flashed new AOSP14 images with the following CycleCountBackupRestore::Read:

void CycleCountBackupRestore::Read(const std::string &path, int &cycles) {
    std::string buffer;

    LOG(WARNING) << "Test Access: "  << access(path.c_str(), F_OK);

    if (!android::base::ReadFileToString(path, &buffer)) {
        LOG(WARNING) << "Failed to read battery cycles from " << path;
        return;
    }

    LOG(WARNING) << "Test Buffer: '" << buffer << "'";

    buffer = ::android::base::Trim(buffer);
    try {
        cycles = std::stoi(buffer);
    } catch (std::out_of_range &e) {
        LOG(WARNING) << "Battery cycle count in persist storage file is out of bounds: " << path;
        return;
    } catch (std::invalid_argument &e) {
        LOG(WARNING) << "Data format is wrong in persist storage file: " << path;
        return;
    } catch (...) {
        LOG(WARNING) << "Data format is wrong in persist storage file and exception sneaked through: " << path;
        LOG(WARNING) << "Failed Buffer: '" << buffer << "'";
        LOG(WARNING) << "Failed Access: "  << access(path.c_str(), F_OK);
        return;
    }
    LOG(VERBOSE) << "Read " << cycles << " battery cycles from " << path;
}

Corresponding logcat grep:

01-01 00:00:12.508   786   786 W android.hardware.health@2.1-service: Test Access: 0
--
01-01 00:00:12.518   786   786 W android.hardware.health@2.1-service: Test Buffer: '4'
01-01 00:00:12.518   786   786 W android.hardware.health@2.1-service: Test Access: -1
01-01 00:00:12.519   786   786 W android.hardware.health@2.1-service: Failed to read battery cycles from /sys/class/power_supply/battery/cycle_count
01-01 00:00:12.519   786   786 W android.hardware.health@2.1-service: Error writing battery cycles to /sys/class/power_supply/battery/cycle_count: No such file or directory
01-01 00:00:12.527   786   786 W android.hardware.health@2.1-service: Read max battery capacity from sysfs error: No such file or directory
01-01 00:00:12.528   786   786 I android.hardware.health@2.1-service: default instance initializing with healthd_config...

So, here it is nicely caught before proceeding to stoi. Which doesn't help me in terms of reproducing it.

But coming back to the question: should we replace it with sscanf as C++ exceptions were failing? I can check whether they still fail, though...

rinigus commented 2 weeks ago

I have added the following snippet into the CycleCountBackupRestore::Read:

    int testi;
    try {
        testi = std::stoi("hello"); 
    } catch (std::out_of_range &e) {
        LOG(WARNING) << "Test std::out_of_range";
    } catch (std::invalid_argument &e) {
        LOG(WARNING) << "Test std::invalid_argument";
    } catch (...) {
        LOG(WARNING) << "Test all catch";
    }

and have in the logs:

01-01 00:00:04.126   796   796 W android.hardware.health@2.1-service: Test all catch

i.e. the original issue is still there - we don't catch the corresponding exception.

rinigus commented 2 weeks ago

The bigger issue is that the cycles file is never made as we have permission denied errors:

01-01 00:00:09.259     0     0 W healthd : batteryCapacityLevelPath not found
--
01-01 00:00:09.259     0     0 W healthd : chargingPolicyPath not found
01-01 00:00:09.259  1406  1406 W android.hardware.health@2.1-service: Error writing battery cycles to /sys/class/power_supply/battery/cycle_count: Permission denied
01-01 00:00:09.260  1406  1406 W android.hardware.health@2.1-service: Write max battery capacity to sysfs error: Permission denied
01-01 00:00:09.261  1406  1406 I android.hardware.health@2.1-service: default instance initializing with healthd_config...
01-01 00:00:09.261  1406  1406 I HidlServiceManagement: Registered android.hardware.health@2.1::IHealth/default
01-01 00:00:09.261  1406  1406 I HidlServiceManagement: Removing namespace from process name android.hardware.health@2.1-service to health@2.1-service.
01-01 00:00:09.262  1406  1406 I android.hardware.health@2.1-service: default: Hal init done

And similar permission denied errors later in the log for "Write max battery capacity"

MarijnS95 commented 2 weeks ago

You shouldn't be able to create a file in this location in sysfs, it's an attribute of the battery driver/device.

bartcubbins commented 2 weeks ago

But OK, should we just replace it with sscanf and avoid C++ exceptions?

Please use android::base::ParseInt instead

rinigus commented 2 weeks ago

Sure. Will prepare PR with it.

rinigus commented 2 weeks ago

Weird, for some reason this method (https://github.com/sonyxperiadev/device-sony-common/blob/master/hardware/health/CycleCountBackupRestore.cpp#L78) is not called anymore. Is it normal? Don't want to start flashing stock and getting back to AOSP from it just to test it.

I have added few log calls to test, but never got anything from it in logcat.

bartcubbins commented 2 weeks ago

The thing is... DSP charger firmware never increments cycle_count on Nagara, Yodo... and most likely on other devices with QTI Glink battery charger

rinigus commented 2 weeks ago

Does it mean that this method is not even called on a regular boot? Since that what it looks like for me right now and I cannot figure out why.

bartcubbins commented 2 weeks ago

I can clearly see that the method is called and prints a warning which is acceptable. I checked this literally just now on the fresh build

01-01 00:00:04.481   794   794 W android.hardware.health@2.1-service: Failed to read battery cycles from /mnt/vendor/persist/battery/battery_cycle_count
01-01 00:00:04.481   794   794 W android.hardware.health@2.1-service: Failed to read battery cycles from /sys/class/power_supply/battery/cycle_count

Please provide me with the full logcat. I'm interested to understand what's going on there

rinigus commented 2 weeks ago

Thank you very much! Will be able to provide it tonight.

rinigus commented 2 weeks ago

Please find attached full logcat. That's using current CycleCountBackupRestore::Read as it is in common repo (no patches).

Few changes in my tree:

logs-14-nov4-2.gz

rinigus commented 2 weeks ago

OK, on the last boot I can see the debug messages again from CycleCountBackupRestore::Read. Will work on PR for this.

bartcubbins commented 1 week ago

I have added the following snippet into the CycleCountBackupRestore::Read:

    int testi;
    try {
        testi = std::stoi("hello"); 
    } catch (std::out_of_range &e) {
        LOG(WARNING) << "Test std::out_of_range";
    } catch (std::invalid_argument &e) {
        LOG(WARNING) << "Test std::invalid_argument";
    } catch (...) {
        LOG(WARNING) << "Test all catch";
    }

and have in the logs:

01-01 00:00:04.126   796   796 W android.hardware.health@2.1-service: Test all catch

i.e. the original issue is still there - we don't catch the corresponding exception.

I converted our Health HAL to AIDL and now I catch the proper invalid_argument exception 03-22 16:48:50.591 803 803 W android.hardware.health-service.sony: Test std::invalid_argument

rinigus commented 1 week ago

Excellent! I think it is way better than my submitted PR, thank you very much. Is this conversion to AIDL already merged?

bartcubbins commented 1 week ago

Not yet, it's in my private branch. I have a lot of different changes here besides this one that need to be sorted out

MarijnS95 commented 1 week ago

I converted our Health HAL to AIDL and now I catch the proper invalid_argument exception 03-22 16:48:50.591 803 803 W android.hardware.health-service.sony: Test std::invalid_argument

It is hard to believe that converting from a HIDL to AIDL HAL has an effect on how exceptions are handled in C++ which should remain mostly unchanged. Unless compilation flags for our own .cpp files are changed, a different STL is linked, or...?

@bartcubbins just as a sanity-check, have you made sure that the exception isn't caught for you in the previous HIDL variant, and it isn't some configuration/environment "issue" / mismatch on Rinigus' side?

bartcubbins commented 1 week ago

I'm 1000% sure. Just checked again. I have the same behavior as @rinigus 03-23 18:27:34.075 797 797 W android.hardware.health@2.1-service: Test all catch

MarijnS95 commented 1 week ago

Thanks for confirming, it is still very confusing and unexpected :sweat_smile:. Looking forward to seeing your changes to i.e. Android.bp that might have an effect.