Open rinigus opened 3 weeks ago
@rinigus that should have exactly nothing to do with what you call a "Related PR" unless the old code in question was broken such that .Restore()
never got called before.
Scrolling the error message top-to-bottom and left-to-right with all unrelated lines is incredibly tedious and frustrating, so I'm cutting it down to the bare minimum that is important:
Abort message: 'terminating with uncaught exception of type St16invalid_argument: stoi: no conversion'
...
backtrace:
#00 pc 0000000000051994 /apex/com.android.runtime/lib64/bionic/libc.so (abort+164) (BuildId: 2c82bf49529e9aa25cb9e5eca777a56f)
#01 pc 000000000004910c /apex/com.android.vndk.v33/lib64/libc++.so (abort_message+248) (BuildId: ac7b79d1f77424245e84c9e5e12121c7)
#02 pc 00000000000492d4 /apex/com.android.vndk.v33/lib64/libc++.so (demangling_terminate_handler()+208) (BuildId: ac7b79d1f77424245e84c9e5e12121c7)
#03 pc 0000000000049e84 /apex/com.android.vndk.v33/lib64/libc++.so (std::__terminate(void (*)())+12) (BuildId: ac7b79d1f77424245e84c9e5e12121c7)
#04 pc 00000000000494ec /apex/com.android.vndk.v33/lib64/libc++.so (__cxxabiv1::failed_throw(__cxxabiv1::__cxa_exception*)+28) (BuildId: ac7b79d1f77424245e84c9e5e12121c7)
#05 pc 0000000000049450 /apex/com.android.vndk.v33/lib64/libc++.so (__cxa_throw+116) (BuildId: ac7b79d1f77424245e84c9e5e12121c7)
#06 pc 000000000009ae94 /apex/com.android.vndk.v33/lib64/libc++.so (std::__1::stoi(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, unsigned long*, int)+384) (BuildId: ac7b79d1f77424245e84c9e5e12121c7)
#07 pc 000000000000ac38 /vendor/lib64/hw/android.hardware.health@2.0-impl-2.1-sony.so (device::sony::health::CycleCountBackupRestore::Read(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, int&)+148) (BuildId: 47284e398f31910ce682e251b7e79c63)
#08 pc 000000000000ab4c /vendor/lib64/hw/android.hardware.health@2.0-impl-2.1-sony.so (device::sony::health::CycleCountBackupRestore::Restore()+36) (BuildId: 47284e398f31910ce682e251b7e79c63)
#09 pc 000000000000a1c0 /vendor/lib64/hw/android.hardware.health@2.0-impl-2.1-sony.so (device::sony::health::SonyHealth::SonyHealth(std::__1::unique_ptr<healthd_config, std::__1::default_delete<healthd_config> >&&)+116) (BuildId: 47284e398f31910ce682e251b7e79c63)
#10 pc 000000000000a0e0 /vendor/lib64/hw/android.hardware.health@2.0-impl-2.1-sony.so (HIDL_FETCH_IHealth+204) (BuildId: 47284e398f31910ce682e251b7e79c63)
What, strangely, seems to happen is that std::invalid_argument
that the health HAL is compiled with might be different than the std::invalid_argument
that libc++.so
is raising here (IIRC as defined at https://cs.android.com/android-llvm/toolchain/llvm-project/+/main:libcxx/src/string.cpp;l=62-64;drc=9783f28cbb155e4a8d49c12e1c60ce14dcfaf0c7).
Alternatively AOSP could be compiling our HAL without RTTI (https://stackoverflow.com/questions/74441543/why-stdinvalid-argument-is-not-caught-with-no-rtti-in-macos-m1-environment)? Again nothing that the "Related PR" changed?
I see it for the first time. On Nagara Health may not receive a value until ADSP is initialized, but then it restarts and everything goes smoothly. This is Nagara A14r67. I'll try one more time on a fresh build later
01-01 00:00:04.449 783 783 W android.hardware.health@2.1-service: Failed to read battery cycles from /mnt/vendor/persist/battery/battery_cycle_count
01-01 00:00:04.449 783 783 W android.hardware.health@2.1-service: Failed to read battery cycles from /sys/class/power_supply/battery/cycle_count
01-01 00:00:04.449 783 783 W android.hardware.health@2.1-service: Could not read battery capacity persist file from /mnt/vendor/persist/battery/battery_charge_full: No such file or directory
01-01 00:00:04.449 783 783 W android.hardware.health@2.1-service: Read max battery capacity from sysfs error: No such file or directory
01-01 00:00:04.450 783 783 I HidlServiceManagement: Registered android.hardware.health@2.1::IHealth/default
01-01 00:00:04.450 783 783 I HidlServiceManagement: Removing namespace from process name android.hardware.health@2.1-service to health@2.1-service.
01-01 00:00:04.450 783 783 I android.hardware.health@2.1-service: default: Hal init done
---------
01-01 00:00:09.437 1346 1346 W android.hardware.health@2.1-service: Could not read battery capacity persist file from /mnt/vendor/persist/battery/battery_charge_full: No such file or directory
01-01 00:00:09.437 1346 1346 I android.hardware.health@2.1-service: Saved learned max battery capacity of 5056 mAh to persist storage
01-01 00:00:09.437 1346 1346 I android.hardware.health@2.1-service: default instance initializing with healthd_config...
01-01 00:00:09.438 1346 1346 I HidlServiceManagement: Registered android.hardware.health@2.1::IHealth/default
01-01 00:00:09.438 1346 1346 I HidlServiceManagement: Removing namespace from process name android.hardware.health@2.1-service to health@2.1-service.
01-01 00:00:09.438 1346 1346 I android.hardware.health@2.1-service: default: Hal init done
@rinigus can you provide me your /vendor/bin/hw/android.hardware.health@2.1-service
binary?
@MarijnS95 : I will check PR closer and will trim logs more next time :) . Dropped "Related PR" from original message.
@bartcubbins: let me bake a new one and check that it fails the same way
@bartcubbins despite the broken log messages in CycleCountBackupRestore::Read()
, this function is used to read from both persist and the actual sysfs file.
@rinigus said he came from a fresh stock, so it's quite possible that they have the file there in a different format; unfortunately the catch(...)
workaround probably caused the HAL to already overwrite it. Or the file was simply corrupt?
Also note that LearnedCapcityBackupRestore
strangely uses std::stoi()
for ReadFromPersistSTorage()
, and sscanf(..., "%d", ...)
for ReadFromSRAM()
(which is /sys/class/power_supply/bms/charge_full
).
Very strange, I don't have that issue anymore. Looks like after successful boot by AOSP and shutdown, it doesn't appear. It fails on first boot after you come from stock as in:
01-01 00:00:06.708 792 792 W android.hardware.health@2.1-service: Data format is wrong in persist storage file and exception sneaked through: /mnt/vendor/persist/battery/battery_cycle_count
That's the first successful boot after coming from stock 64.2.A.2.185 and after applying the patch from my original message.
Before that was the boot where I was stuck on on "android" and exception issue as in the original message.
After the first successful boot, I get logs similar to @bartcubbins. Although, have lots of errors related to selinux, but that's another general issue.
@MarijnS95 - cross-posting, was looking for binary and I agree with you.
So, it is hard to reproduce. We need to probably go back to stock and then return. However, issue with exception catching is still there. @MarijnS95 - I wonder how to check your idea regarding different compilation?
@bartcubbins: File attached
@rinigus without fixing this, I'd love to learn more about the downstream format. The entire goal of this HAL is to "persist" these states, but that's all thrown out of the window if we restart the counter from stock, and stock restarts it when coming back from SODP.
Don't think I understand what you mean exactly with cross-posting
:sweat_smile:
Re stock / aosp restart. Agree, if we can figure out the format. But I am not sure it is important in practice for many. I would expect majority of users sticking to one of the camps. For SFOS users, they get a device, flash it, and use on SFOS without going to stock.
In my eyes, transition issues between stock and AOSP are minor / low priority. Right now, on nagara, basically nothing works and there are way bigger issues to fix :)
Re cross-posting: wrote my message part regarding coming from fresh stock, started looking for file, and then saw your message.
despite the broken log messages in
CycleCountBackupRestore::Read()
, this function is used to read from both persist and the actual sysfs file.
Yes, I am familiar with the code
I'd love to learn more about the downstream format
XQ-CQ54:/ # getprop ro.build.id 64.2.A.2.215B XQ-CQ54:/ # cat /mnt/vendor/persist/battery/battery_charge_full 5056000XQ-CQ54:/ #
and battery_cycle_count
just don't exist
What is weird that it manages to pass android::base::ReadFileToString(path, &buffer)
above that in this case. But OK, should we just replace it with sscanf and avoid C++ exceptions?
Did/does the file exist for you, @rinigus (if you come directly from stock)? If the file didn't exist that function shouldn't have passed (unless it returned an empty string in this case?).
In short, I cannot trigger it when testing today.
I flashed back stock, booted into it for setup and once more to confirm it was OK. As its not rooted, couldn't see any vendor/persist. Then flashed new AOSP14 images with the following CycleCountBackupRestore::Read
:
void CycleCountBackupRestore::Read(const std::string &path, int &cycles) {
std::string buffer;
LOG(WARNING) << "Test Access: " << access(path.c_str(), F_OK);
if (!android::base::ReadFileToString(path, &buffer)) {
LOG(WARNING) << "Failed to read battery cycles from " << path;
return;
}
LOG(WARNING) << "Test Buffer: '" << buffer << "'";
buffer = ::android::base::Trim(buffer);
try {
cycles = std::stoi(buffer);
} catch (std::out_of_range &e) {
LOG(WARNING) << "Battery cycle count in persist storage file is out of bounds: " << path;
return;
} catch (std::invalid_argument &e) {
LOG(WARNING) << "Data format is wrong in persist storage file: " << path;
return;
} catch (...) {
LOG(WARNING) << "Data format is wrong in persist storage file and exception sneaked through: " << path;
LOG(WARNING) << "Failed Buffer: '" << buffer << "'";
LOG(WARNING) << "Failed Access: " << access(path.c_str(), F_OK);
return;
}
LOG(VERBOSE) << "Read " << cycles << " battery cycles from " << path;
}
Corresponding logcat grep:
01-01 00:00:12.508 786 786 W android.hardware.health@2.1-service: Test Access: 0
--
01-01 00:00:12.518 786 786 W android.hardware.health@2.1-service: Test Buffer: '4'
01-01 00:00:12.518 786 786 W android.hardware.health@2.1-service: Test Access: -1
01-01 00:00:12.519 786 786 W android.hardware.health@2.1-service: Failed to read battery cycles from /sys/class/power_supply/battery/cycle_count
01-01 00:00:12.519 786 786 W android.hardware.health@2.1-service: Error writing battery cycles to /sys/class/power_supply/battery/cycle_count: No such file or directory
01-01 00:00:12.527 786 786 W android.hardware.health@2.1-service: Read max battery capacity from sysfs error: No such file or directory
01-01 00:00:12.528 786 786 I android.hardware.health@2.1-service: default instance initializing with healthd_config...
So, here it is nicely caught before proceeding to stoi
. Which doesn't help me in terms of reproducing it.
But coming back to the question: should we replace it with sscanf
as C++ exceptions were failing? I can check whether they still fail, though...
I have added the following snippet into the CycleCountBackupRestore::Read
:
int testi;
try {
testi = std::stoi("hello");
} catch (std::out_of_range &e) {
LOG(WARNING) << "Test std::out_of_range";
} catch (std::invalid_argument &e) {
LOG(WARNING) << "Test std::invalid_argument";
} catch (...) {
LOG(WARNING) << "Test all catch";
}
and have in the logs:
01-01 00:00:04.126 796 796 W android.hardware.health@2.1-service: Test all catch
i.e. the original issue is still there - we don't catch the corresponding exception.
The bigger issue is that the cycles file is never made as we have permission denied errors:
01-01 00:00:09.259 0 0 W healthd : batteryCapacityLevelPath not found
--
01-01 00:00:09.259 0 0 W healthd : chargingPolicyPath not found
01-01 00:00:09.259 1406 1406 W android.hardware.health@2.1-service: Error writing battery cycles to /sys/class/power_supply/battery/cycle_count: Permission denied
01-01 00:00:09.260 1406 1406 W android.hardware.health@2.1-service: Write max battery capacity to sysfs error: Permission denied
01-01 00:00:09.261 1406 1406 I android.hardware.health@2.1-service: default instance initializing with healthd_config...
01-01 00:00:09.261 1406 1406 I HidlServiceManagement: Registered android.hardware.health@2.1::IHealth/default
01-01 00:00:09.261 1406 1406 I HidlServiceManagement: Removing namespace from process name android.hardware.health@2.1-service to health@2.1-service.
01-01 00:00:09.262 1406 1406 I android.hardware.health@2.1-service: default: Hal init done
And similar permission denied errors later in the log for "Write max battery capacity"
You shouldn't be able to create a file in this location in sysfs, it's an attribute of the battery driver/device.
But OK, should we just replace it with sscanf and avoid C++ exceptions?
Please use android::base::ParseInt
instead
Sure. Will prepare PR with it.
Weird, for some reason this method (https://github.com/sonyxperiadev/device-sony-common/blob/master/hardware/health/CycleCountBackupRestore.cpp#L78) is not called anymore. Is it normal? Don't want to start flashing stock and getting back to AOSP from it just to test it.
I have added few log calls to test, but never got anything from it in logcat.
The thing is... DSP charger firmware never increments cycle_count on Nagara, Yodo... and most likely on other devices with QTI Glink battery charger
Does it mean that this method is not even called on a regular boot? Since that what it looks like for me right now and I cannot figure out why.
I can clearly see that the method is called and prints a warning which is acceptable. I checked this literally just now on the fresh build
01-01 00:00:04.481 794 794 W android.hardware.health@2.1-service: Failed to read battery cycles from /mnt/vendor/persist/battery/battery_cycle_count
01-01 00:00:04.481 794 794 W android.hardware.health@2.1-service: Failed to read battery cycles from /sys/class/power_supply/battery/cycle_count
Please provide me with the full logcat. I'm interested to understand what's going on there
Thank you very much! Will be able to provide it tonight.
Please find attached full logcat. That's using current CycleCountBackupRestore::Read as it is in common repo (no patches).
Few changes in my tree:
OK, on the last boot I can see the debug messages again from CycleCountBackupRestore::Read. Will work on PR for this.
I have added the following snippet into the
CycleCountBackupRestore::Read
:int testi; try { testi = std::stoi("hello"); } catch (std::out_of_range &e) { LOG(WARNING) << "Test std::out_of_range"; } catch (std::invalid_argument &e) { LOG(WARNING) << "Test std::invalid_argument"; } catch (...) { LOG(WARNING) << "Test all catch"; }
and have in the logs:
01-01 00:00:04.126 796 796 W android.hardware.health@2.1-service: Test all catch
i.e. the original issue is still there - we don't catch the corresponding exception.
I converted our Health HAL to AIDL and now I catch the proper invalid_argument exception
03-22 16:48:50.591 803 803 W android.hardware.health-service.sony: Test std::invalid_argument
Excellent! I think it is way better than my submitted PR, thank you very much. Is this conversion to AIDL already merged?
Not yet, it's in my private branch. I have a lot of different changes here besides this one that need to be sorted out
I converted our Health HAL to AIDL and now I catch the proper invalid_argument exception
03-22 16:48:50.591 803 803 W android.hardware.health-service.sony: Test std::invalid_argument
It is hard to believe that converting from a HIDL to AIDL HAL has an effect on how exceptions are handled in C++ which should remain mostly unchanged. Unless compilation flags for our own .cpp
files are changed, a different STL is linked, or...?
@bartcubbins just as a sanity-check, have you made sure that the exception isn't caught for you in the previous HIDL variant, and it isn't some configuration/environment "issue" / mismatch on Rinigus' side?
I'm 1000% sure. Just checked again. I have the same behavior as @rinigus
03-23 18:27:34.075 797 797 W android.hardware.health@2.1-service: Test all catch
Thanks for confirming, it is still very confusing and unexpected :sweat_smile:. Looking forward to seeing your changes to i.e. Android.bp
that might have an effect.
Platform: nagara Device: pdx223 Kernel version: 5.15 Android version: android-14_r67 Software binaries version: Patched SW_binaries_for_Xperia_Android_13_5.15_v4a_nagara.zip
Previously working on
Same issue was for me on AOSP13 as well, recent builds.
Description
On boot, device is stuck on ANDROID logo. In logcat, there are failures with android.hardware.health@2.1:
Symptoms
Stuck on Android logo while booting
How to reproduce
New build
Additional context
Maybe important: last stock was
XQ-CT54_Customized EU-UK_64.2.A.2.185
While maybe it cannot reach battery data, the code seems to have corresponding exception handling in https://github.com/sonyxperiadev/device-sony-common/blob/master/hardware/health/CycleCountBackupRestore.cpp#L92
I could make it work by adding
But I cannot figure out why the original exception does not catch it.
cc: @MarijnS95