topjohnwu / Magisk

The Magic Mask for Android
GNU General Public License v3.0
48.28k stars 12.33k forks source link

Honor View 10 boot crash and erecovery with 20303+ #2645

Closed HansGeiz closed 4 years ago

HansGeiz commented 4 years ago

Device: berkeley (Honor View 10) Android-Version: 9 (BKL-L09C432 9.0.0.233) LineageOS-Version: 16.0.20200213 Kernel-Version: 4.9.97 Magisk / Manager: 20.3 / 7.5.1 (force encryption, AVB 2.0/dm-verity, Recovery mode)

Installed Magisk v20.4:

Every method works, but the device crashes on reboot (to recovery) and boots to eRecovery. I unpacked the patched recovery ramdisks with AIK and compared the ramdisk directories. The only diff in v20.4 is the file "init" and this is identical to arm/magiskinit64 from Magisk-v20.4.zip. The are no logs in cache, tried already with canary debug version.

osm0sis commented 4 years ago

Any way to get a last_kmsg / console-ramoops after the reboot to see if it captured the issue?

HansGeiz commented 4 years ago

Got some logs from /sys/fs/pstore:

console-ramoops-0.zip dmesg-ramoops-0.zip

osm0sis commented 4 years ago

Unfortunately looks like eRecovery also generates a console-ramoops so it doesn't appear that was from the same boot up as the crash.

Best I can suggest is work through the canaries leading up to 20.4 to try and figure out when the issue started.

https://github.com/topjohnwu/magisk_files/commits/canary

HansGeiz commented 4 years ago

The problem starts with version "0dc9f5c3" (20303). I tried another approach to get logs and now you can find magiskinit in dmesg log. Maybe this time the log is more useful:

console-ramoops-0.zip dmesg-ramoops-0.zip

osm0sis commented 4 years ago

Quickly looking at the commits between b39f4075 (20302) on Jan 10, 2020 and 0dc9f5c3 (20303) on Jan 22, 2020, there are very few commits to magiskinit that could be causing the regression.

Mainly these 2 are suspect: https://github.com/topjohnwu/Magisk/commit/836bfbdd028c19d22e3dbb11a478ea397b7a7ca4 https://github.com/topjohnwu/Magisk/commit/ba55e2bc3288963d093b5fbd88f18c4622e3a43c

CC: @topjohnwu

osm0sis commented 4 years ago

That dmesg looks like it contains the crash, a kernel panic after sepolicy patching, so the SELinux updates on Jan 20, 2020 could be suspect as well. :+1:

CC: @topjohnwu

osm0sis commented 4 years ago

@HansGeiz 20405 has a rewrite of all the init logic so might be worth trying again to confirm this is still an issue and give us some fresh logs. :+1:

HansGeiz commented 4 years ago

No change, it keeps crashing. console-ramoops-0.zip dmesg-ramoops-0.zip

KreAch3R commented 4 years ago

I can confirm that I observe the same behavior with stable 20.4 on a Honor View 10 running Pixel Experience

cawidtu commented 4 years ago

I can confirm the same issue on a Huawei Honor 10 (COL-L29, similar to Honor View 10) with EMUI 9 vendor and AEX Beta 1 P1 custom ROM from openkirin.net. Magisk 20.3 works, while Magisk 20.4 leads to a bootloop (the same with latest canary). Inspecting the commits between Magisk 20.3 and 20.4, I found that b2ddba4cbfd5f589c3b4e74332154845fade40a0 is the latest working version. So commit fb60bea6597700cfd322ad2c7a4cef96822090fd dealing with an SELinux update seems to be the culprit.

cawidtu commented 4 years ago

I went through the SELinux commits relevant for the above-mentioned update (2.9 -> 3.0) and found that commit dc4e54126bf25dea4d51820922ccd1959be68fbc "libsepol: Make an unknown permission an error in CIL" causes the kernel panic and reboot. Hence, it seems that the Huawei system contains some buggy policies. After reverting this commit I could build a working version of Magisk 20.4 (375ab93ee304a4aeb1e7d906e1272f57b2cf2b44, March 23). It is still possible to revert the harmful SELinux commit in version 3.1 so that even the latest GitHub commit of Magisk (fc67c0195f3d07f7be7ec62107f341b55f7dda3a) can be built with this fix. For some reason I have some trouble with this most recent Magisk version since the phone boots to the original recovery instead of Magisk, but that's most likely a different issue.

I am not sure how to proceed: should the harmful SELinux commit be reverted permanently or should the error thrown by SELinux be handled at a higher level?

osm0sis commented 4 years ago

I went through the SELinux commits relevant for the above-mentioned update (2.9 -> 3.0) and found that commit dc4e54126bf25dea4d51820922ccd1959be68fbc "libsepol: Make an unknown permission an error in CIL" causes the kernel panic and reboot. Hence, it seems that the Huawei system contains some buggy policies. After reverting this commit I could build a working version of Magisk 20.4 (375ab93, March 23). It is still possible to revert the harmful SELinux commit in version 3.1 so that even the latest GitHub commit of Magisk (fc67c01) can be built with this fix. For some reason I have some trouble with this most recent Magisk version since the phone boots to the original recovery instead of Magisk, but that's most likely a different issue.

I am not sure how to proceed: should the harmful SELinux commit be reverted permanently or should the error thrown by SELinux be handled at a higher level?

@topjohnwu, some definite progress here, and a proposed revert which could resolve this issue. Thoughts?

zixing131 commented 4 years ago

https://cn.ui.vmall.com/thread-21969811-1-1-7119.html 使用这个帖子里面修复的img文件可以正常进入