Closed stefanhh0 closed 4 years ago
the logs don't explicitly reference a crash unfortunately. however, there are some weird messages from somc_panel that shouldn't be there i think. @kholk @MarijnS95 looks like something is up with color calibration?
Had another two different crashes.
First one is like above and was a double crash where the vibrator was activated twice. Again, I guess no crash is referenced directly in the files and it is quite similar to the file I have posted above. By the way the vibrator is activated the second time (presumable 2nd crash of the double crash) before the white screen with black Sony logo appears, so it happens really early in the start-up phase: pstore-1.tar.gz It also contains the color calibration messages:
[ 3599.213047] somc_panel_color_manager: somc_panel_inject_crtc_overrides (751): Override: Already have original funcs! Is setup called twice??
[ 3599.213193] somc_panel_color_manager: somc_panel_pcc_setup (855): u,v is flashed 0.
[ 3599.213297] somc_panel_color_manager: somc_panel_colormgr_apply_calibrations: Couldn't apply PCC calibration
[ 3599.213428] somc_panel_color_manager: somc_panel_colormgr_apply_calibrations: Cannot send HSIC calibration
[ 3637.044419] somc_panel_color_manager: somc_panel_inject_crtc_overrides (751): Override: Already have original funcs! Is setup called twice??
[ 3637.044757] somc_panel_color_manager: somc_panel_pcc_setup (855): u,v is flashed 0.
[ 3637.045119] somc_panel_color_manager: somc_panel_colormgr_apply_calibrations: Couldn't apply PCC calibration
[ 3637.045584] somc_panel_color_manager: somc_panel_colormgr_apply_calibrations: Cannot send HSIC calibration
I have extracted also other messages that may indicate a problem or not:
[ 3853.426166] CPU features: SANITY CHECK: Unexpected variation in SYS_ID_AA64MMFR0_EL1. Boot CPU: 0x00000000001122, CPU4: 0x00000000101122
This messages appears often but only for the cpus 4 to 7 not for 0 to 3.
[ 3771.593884] kgsl kgsl-3d0: |counter_delta| Abnormal value:0x101b85b (0x1026c0d) from perf counter : 0x3b0
[ 3636.506097] cache: parent cpu3 should not be sleeping
Comes for cpus 1 to 6 not for 0 and 7
[ 3739.948268] CHRDEV "qcwlanstate" major number 220 goes below the dynamic allocation range
[ 3739.951842] ipa ipa3_uc_reg_rdyCB:1774 bad parm. inout= (null) [ 3739.980534] ipa ipa3_uc_reg_rdyCB:1774 bad parm. inout= (null) [ 3739.982445] send_filled_buffers_to_user: Send Failed -22 drop_count = 1
[ 3739.987369] ipa ipa3_uc_reg_rdyCB:1774 bad parm. inout= (null) [ 3740.044557] IPC_RTR: process_new_server_msg: Server 00001003 create rejected, version = 0
[ 3739.938030] wlan: Loading driver v5.1.1.69T ()
[ 3740.200959] cnss_utils: WLAN MAC address is not set, type 0
After the aboves double-crash the phone booted and shortly after having booted it crashed and rebooted again, this time only with the vibrator being activated once. In the pstore files there is more information, kernel call-stack and also buffer underfow errors. However I think those two kind of crashes are something completely different: pstore-2.tar.gz
On a fresh build I got just two more double crashes (vibrator was activated twice):
Kernel version: 4.9.174-gd7ab313501f1 Android version: android-9.0.0_r37 OEM Image: SW_binaries_for_Xperia_Android_9.0_2.3.2_v8_yoshino.img Build: aosp_g8441-userdebug 9 PQ3A.190505.002 eng.stefan.20190514.014237 Version baseband: 1308-8921_47.1.A.16.20
pstore-1.tar.gz pstore-2.tar.gz
This time the files contain Kernel-Exceptions I hope those files are more helpful for you finding the root-cause of the crashes.
Just got two more of those double crashes. As of time of writing this comment I am on the latest commits: Kernel version: 4.9.174-g6f8c28697397 Build: aosp_g8441-userdebug 9 PQ3A.190505.002 eng.stefan.20190514.185903
Kernel version: 4.9.174-gfc821b0441e9 Android version: android-9.0.0_r37 OEM Image: SW_binaries_for_Xperia_Android_9.0_2.3.2_v8_yoshino.img Build: aosp_g8441-userdebug 9 PQ3A.190505.002 eng.stefan.20190516.231020 Version baseband: 1308-8921_47.1.A.16.20
With a fresh up-to-date build from yesterday late evening it is still occurring it happened several times already shortly after I have flashed and booted this morning. The overall stability of yoshino is currently poor, the last two days the double crashes/reboots occurred several times a day, I guess around 8 to 10 times, I haven't counted exactly. Can someone confirm my observations? Are the pstore contents somehow useful or is there something I could provide to you additionally that would help you finding the root cause(s) of the the problem(s)?
Here are two pstores that occurred on that build, the two pstores look differently content-wise. pstore.tar.gz pstore-2.tar.gz
Just in case, I have also saved a dmesg file from a fresh booted system when it managed to startup just normally: dmesg.log
@oshmoun @stefanhh0 The color calibrations are nothing to worry about, though annoying (spammy) and wasting CPU cycles. As far as I understand, retrieving a 0, 0
calibration from the display indicates an error has occured (following the code), but after some discussion it seems this is a valid case where the display shows "ground truth" without needing any extra adjustment.
I propose to check every device that exhibits this behaviour, and decide:
somc,mdss-dsi-uv-command
, conv_uv_data
, or another piece of code (specific to Yoshino);somc,mdss-dsi-pcc-force-cal
for Yoshino;0, 0
is a valid response for every display and platform.I do have a Mermaid here that prints the same result, but didn't manage to check whether this is normal.
In the end, even just making the code continue without doing the setup every time the PCC "changes" (happens when the display turns on) will save on noise and cycles.
With the build: Kernel version: 4.9.174-ga85976871290 Build: aosp_g8441-userdebug 9 PQ3A.190505.002 eng.stefan.20190522.213614 AOSP is a lot more stable then it used to be before. Not a single crash in the last 14 hours (uptime: 13:55). That is really a huge improvement for the overall stability.
With the older builds including the build before the current one: Kernel version: 4.9.174-g7f4c5dfbbd84 Build: aosp_g8441-userdebug 9 PQ3A.190505.002 eng.stefan.20190521.193309 AOSP used to crash several times a day, so the changes between those two builds have improved the stability. The timestamp in Build: reflects the time when the repos have been synced, since I always build shortly after syncing.
I would like to keep the ticket open since I don't know if the double crash issue is as well fixed and to see if the coming builds confirm the improvement in overall stability. I will get back during the next week with my findings.
Good job and thank you all for bringing back some stability to AOSP!
Platform: Yoshino Device: Lilac Kernel version: 4.9.174-ga85976871290 Android version: android-9.0.0_r37 OEM Image: SW_binaries_for_Xperia_Android_9.0_2.3.2_v8_yoshino.img Build: aosp_g8441-userdebug 9 PQ3A.190505.002 eng.stefan.20190522.213614 Version baseband: 1308-8921_47.1.A.16.20
Having that said, the phone had another double crash. This time also a dmesg file was written: pstore.tar.gz
To my surprise the dmesg file references the previous kernel:
<6>[27876.006251] CPU: 4 PID: 0 Comm: swapper/4 Tainted: G S W 4.9.174-g7f4c5dfbbd84 #1 And not the current one 4.9.174-ga85976871290 (which then again shows up when I boot the phone and type uname -a). Is that due to the A/B mechanics? How can I force that on A and B are the same images? I think it would make sense to no longer have the unstable version with kernel 4.9.174-g7f4c5dfbbd84 at all on my phone, to get a glimpse on the original logs that have been written when the phone crashed the first time.No, totally wrong I realized that the timestamp of the dmesg file is from the early morning, so it is an old file that has nothing to do with the other files in the pstore.tar.gz and the latest double crash.
console-ramoops-0 should be the file containing dmesg of the directly preceding boot dmesg-ramoops-x are of previous boots, so that explains the old kernel version
Kernel version: 4.9.177-g69cc32601555 Android version: android-9.0.0_r37 OEM Image: SW_binaries_for_Xperia_Android_9.0_2.3.2_v8_yoshino.img Build: aosp_g8441-userdebug 9 PQ3A.190505.002 eng.stefan.20190523.192026 Version baseband: 1308-8921_47.1.A.16.20
The double crash/reboot occurs as well on 4.7.9.177 when I booted the device the very first time directly after flashing the new build. After the double crash/reboot the device started successfully.
This time several exceptions are logged: pstore.tar.gz
Device: Lilac Platform: Yoshino Kernel version: 4.9.182-gfa7fb2c467d4-dirty Android version: android-9.0.0_r37 Software binaries version: SW_binaries_for_Xperia_Android_9.0_2.3.2_v9_yoshino.zip Version baseband: 1308-8921_47.1.A.16.20 Build: aosp_g8441-userdebug 9 PQ3A.190505.002 eng.stefan.20190619.192046
Description Just an update, it is still happening on a fresh clean build. Phone was connected to usb, screen was off. Trying to activate the phone via fingerprint sensor. The phone did not activate the screen, instead after some seconds the already reported double crash and reboot occurred.
Symptoms In various sItuations: hang -> double crash -> reboot.
How to reproduce No reliable recipe found yet, however it happens from time to time when using the phone (every several hours)
Additional context Again exceptions and other failed and error messages can be found in console-ramoops-0, however I can't say whether or not those messages are somehow helpful in identifying the root cause of the problem. pstore.tar.gz
Kernel version: 4.9.182-g6593c13acef8 Android version: android-9.0.0_r44 Software binaries version: SW_binaries_for_Xperia_Android_9.0_2.3.2_v9_yoshino.zip Version baseband: 1308-8921_47.1.A.16.20 Build: aosp_g8441-userdebug 9 PQ3A.190705.003 eng.stefan.20190705.182120
Just another update, it is still happening on latest android/kernel: pstore.tar.gz
I know about the crashes during charging, that's due to the charger thermal zone going nuts after multiple suspend-resume cycles. It's safe, because we are monitoring lots of zones and the one that goes nuts is a duplicate of what we already check.... But we cannot remove it, otherwise the charger stops working.....
This kind of crash cannot be resolved on kernel 4.9 due to the fact that the entire RPM framework is royally f*****. Or at least I have never found a way to.
Regarding the other kind of crashes, these may be due to a clock being stuck and failing gracefully, producing the apparently-all-ok crash behavior.
I have been able to solve some issues on other platforms on kernel 4.9 during the 4.14 porting (because I've had to examine the entire thing again).... I've reached Yoshino 2 days ago, let's see if I can spot anything on there!
P.S.: I think you deserve this info. On kernel 4.14 the RPM was finally migrated to the upstream RPMSG API, which is solving most of the big issues that the old crapped one on 4.9 currently has. It's not an excuse or something but, in case we can't do anything good here, there's a good hope for the future, I think.
Thanks a lot for the info. I am happy getting some feedback and looking forward for kernel 4.14. Despite the crashes from time to time the phone is all in all useable with aosp.
Platform: Yoshino Device: Lilac Kernel version: 4.14.176-gf0356fa3bcac:
Android version: android-10.0.0_r36 Software binaries version: SW_binaries_for_Xperia_Android_10.0.7.1_r1_v6_yoshino.img Version baseband: 1307-7511_47.2.A.11.228 Build: aosp_g8441-userdebug 10 QQ2A.200501.001.B3 eng.stefan.20200510.170705
Retestet, phone was connected via usb. Before the reboot I have removed all files in /sys/fs/pstore
.
After system was up again I found following files in /sys/fs/pstore
console-ramoops-0.log dmesg-ramoops-0.log dmesg-ramoops-1.log pmsg-ramoops-0.log
Well, it is currently just a reboot issue and not the original problem. Just let me know if I should open a new clean issue, but the basic info is anyway in my previous comment.
I am just closing this issue in favor of #580 this one is just very old and after I could clarify my mis-interpretation of experiencing a double-crash there is no need to keep this one open.
Platform: Yoshino Device: Lilac Kernel version: 4.9.170-g0227b68deb75 Android version: android-9.0.0_r35 OEM Image: SW_binaries_for_Xperia_Android_9.0_2.3.2_v8_yoshino.img Build: aosp_g8441-userdebug 9 PQ2A.190405.003 eng.stefan.20190502.183239 Version baseband: 1308-8921_47.1.A.16.20
Description With the latest kernel 0227b68deb75 aosp got significantly more unstable. Unfortunately I don't know exactly on which commit exactly I was when it was more stable I just remember that it must have been a commit from May 7th or 8th.
The crash and reboot happens relatively often when waking up the phone (Don't know if the phone is sleeping or in deep sleep - I just can say that the screen is off). It fails then to activate the screen and instead the screen remains dark. Then it takes some time, after that time the vibrator is activated twice with a short pause between the first and second activation of the vibrator.
I am not sure if that means a double crash - would be nice if someone could say if my assumption of a double crash is right. Additionally if so, the original crash is then not available since the information in /sys/fs/pstore is overwritten with the information from the second crash, correct?It is not a double crash but just normal, see also comment here: https://github.com/sonyxperiadev/bug_tracker/issues/580#issuecomment-631097204:
After the phone is back /sys/fs/pstore contains some files: pstore.tar.gz
Symptoms Phone screen remains black when trying to activate -> double crash / reboot.
How to reproduce Happens intermittently, now solid recipe available, however it happens several times a day on normal usage.