topjohnwu / Magisk

The Magic Mask for Android
GNU General Public License v3.0
46.62k stars 11.77k forks source link

Patched boot.img will not proceed past initial magiskinit steps #5806

Closed finnzz closed 2 years ago

finnzz commented 2 years ago

Device: FireTV 2nd gen Cube Android version: Android 9 (FireOS 7) Magisk version name: Canary 42e5f515 Magisk version code: 24308 Amlogic s922x SOC, slot-A only, SAR, no super partition device

I'm trying to create a magisk-patched boot image, and load it with fastboot boot. The patched boot image will only proceed a few steps after magiskinit. There is no kernel panic or freeze, the system just sits there. The device boot.img does not include a ramdisk. I'm using magisk manager's recovery mode option to patch the boot.img and I'm assuming that since I see magiskinit in the boot logs that the system is accepting and loading the ramdisk. I also disabled dm-verity for now to reduce complications.

In the regular stock bootup, the next lines to follow after this holding point are:

[    2.340618@3] prepare_namespace() wait 78
[    2.340657@3] md: Waiting for all devices to be available before autodetect
[    2.341285@3] md: If you don't use raid, use raid=noautodetect
[    2.342293@3] md: Autodetecting RAID arrays.
[    2.342542@3] md: Scanned 0 and added 0 devices.
[    2.343115@3] md: autorun ...
[    2.343484@3] md: ... autorun DONE.
[    2.346165@3] EXT4-fs (mmcblk0p10): mounted filesystem without journal. Opts: (null)
[    2.346448@3] VFS: Mounted root (ext4 filesystem) readonly on device 179:10.
[    2.348352@3] devtmpfs: mounted
[    2.349840@3] Freeing unused kernel memory: 4992K
[    2.360881@4] init: init first stage started!

magisk-boot-log.zip magisk_patched-24308.zip boot.zip

canyie commented 2 years ago
androidboot.slot_suf
fix=normal 
magiskinit: slot=[no]

Send a kmsg grabbed on stock bootup? And send /proc/cmdline if possible?

pndwal commented 2 years ago

I'm trying to create a magisk-patched boot image, and load it with fastboot boot.... The patched boot image will only proceed a few steps after magiskinit... The device boot.img does not include a ramdisk. I'm using magisk manager's recovery mode option to patch the boot.img and I'm assuming that since I see magiskinit in the boot logs that the system is accepting and loading the ramdisk...

With recovery mode surely you should be patching recovery.img! ...

magiskinit must be in a ramdisk that functions... the init for A-only Legacy SAR devices is in /system, not in ramdisk...

When patching boot.img in these devices Magisk will create a basic ramdisk in boot with Magiskinit included, but this should be done with recovery mode disabled... This will only work for devices where bootloader accepts ramdisk in boot... Generally Xiaomi Legacy SAR A-only devices do, but I assume most others brands don't... Magisk can't detect if bootloader is compatible; you can only try to boot this way...

The general assumption is that A-only Legacy SAR devices need magisk in recovery, ie patch recovery.img (or AP for Samsung) with Recovery mode selected.

finnzz commented 2 years ago

Thank you Canyie and Pndwal,

I have tried using the recovery.img as well, but the boot process always stops at that same point whether I'm using recovery or boot images. If the kernel log is showing references to magiskinit, doesn't that mean that the ramdisk created by magisk was accepted and used?

I did make a mistake in my initial post, while troubleshooting this issue, i have tried magisk manager v21, 22, 23, 24, 24.1, 24.2, 24.3. But when i tried installing the canary manager version on the FireTV device it would not proceed past the magisk splash screen. So I installed it on my phone (Android 9, slot A-only but not SAR) and used that to patch the FireTV boot.img.

Canyie, is having androidboot.slot_suffix=normal in the cmdline problematic? Is there a cmd i can call from fastboot to skip it? Otherwise I can edit the bootloader if needed. I'm attaching a boot log from stock boot.img bootup. The kernel cmdline is on line 39. boot-log.zip

EDIT: I changed androidboot.slot_suffix=1 Which returns: magiskinit: slot=[1] But it didn't get the kernel to load any further.

canyie commented 2 years ago

A-only devices should have no slot_suffix afaik, so you should remove it to get magisk work. I don't know if magisk should support this behavior, but anyway I did it. Try this build without modifying cmdline? https://github.com/canyie/Magisk/actions/runs/2290127347

finnzz commented 2 years ago

Great call Canyie, thank you! This seems to have been the problem. With your build, magisk ignores the illegal android.slot_suffix, and proceeds to do it's thing and continue loading the OS. I now see magisk as installed in magisk manager (your build also loads on this FireTV device, where as the canary build i downloaded yesterday will not).

One question, Magisk manager says that it requires a reboot for magisk to fully work. I'm dealing with a non-persistent root at the moment. What function am I missing by not rebooting? Zygisk?

pndwal commented 2 years ago

To be clear, this is A-only SAR, ramdisk=yes?... What was Android launch version?

... when i tried installing the canary manager version on the FireTV device it would not proceed past the magisk splash screen.

This: https://github.com/topjohnwu/Magisk/issues/5787

TJW already merged a fix for that. If your / Canyie's fix is merged in time, next Canary should be good for you... 👍

canyie commented 2 years ago

Magisk manager says that it requires a reboot for magisk to fully work. What function am I missing by not rebooting?

Didn't check the source code but I think they're modules and Zygisk

One question, just in case... are you using stock ROM?

finnzz commented 2 years ago

Didn't check the source code but I think they're modules and Zygisk

One question, just in case... are you using stock ROM?

Ah, that's what it is, I installed the debug version of magisk manager, which probably had some module that needed a reboot. The regular app version doesn't give me that notification. Yes, I am patching the stock boot.img

To be clear, this is A-only SAR, ramdisk=yes?... What was Android launch version?

The 2nd gen Cube is an A-only SAR, ramdisk=no. It was released with Android 9 (FireOS7). The Cube is a little unique because it's one of the few Fire devices using an Amlogic SOC. Amlogic may be similar to Xiaomi in that it accepts a boot.img ramdisk even though stock boot.img does not have one. Or perhaps it's a u-boot thing?

canyie commented 2 years ago

Not sure if Magisk should support it... and I don't know if the current workaround is correct, maybe we should directly check "normal" instead, just in case if some devices' slot's prefix is not '_'...

pndwal commented 2 years ago

The 2nd gen Cube is an A-only SAR, ramdisk=no. It was released with Android 9 (FireOS7). The Cube is a little unique because it's one of the few Fire devices using an Amlogic SOC. Amlogic may be similar to Xiaomi in that it accepts a boot.img ramdisk even though stock boot.img does not have one...

I'd say this... It's the way the OEM builds the bootloader however; Xiaomi uses both MTek & QCom and these A-only Legacy SAR devices generally accommodate a manually added ramdisk in boot which Magisk adds when absent, whereas other OEM's implementations using same SOCs (Samsung etc) don't...

I'm guessing you're properly disabling Recovery mode before patching boot image (or taking Direct install) now.

canyie commented 2 years ago

Hmmm, is that androidboot.slot_suffix=normal cmdline argument present when using fastboot flash rather than fastboot boot to root the device?

finnzz commented 2 years ago

I'm guessing you're properly disabling Recovery mode before patching boot image (or taking Direct install) now.

At the moment, I'm only working with fastboot boot, and only magisk-patching with magisk manager. I am leaving recovery mode checked in magisk manager when patching the stock boot.img, which does not have a ramdisk. Since fastboot boot succeeds in booting and loading the the patched image into FireOS, and magisk is granting apps root access, I'm not sure there is a reason to change things. If you like I can upload my revised patched boot.img for you to verify against the stock boot.img in the OP

Hmmm, is that androidboot.slot_suffix=normal cmdline argument present when using fastboot flash rather than fastboot boot to root the device?

I haven't tried fastboot flash. androidboot.slot_suffix=normal is in the stock kernel cmdline booted from the eMMC, and appears to be part of the normal boot in the board header file raven.zip

Edit: when googling androidboot.slot_suffix=normal, I mainly come across amlogic devices. Generally there are also a lot of Amlogic devices that have problems getting magisk to work. It may be that this affects a number of devices.

pndwal commented 2 years ago

I'm guessing you're properly disabling Recovery mode before patching boot image (or taking Direct install) now.

At the moment, I'm only working with fastboot boot, and only magisk-patching with magisk manager. I am leaving recovery mode checked in magisk manager when patching the stock boot.img, which does not have a ramdisk.

No, you are using boot mode, not recovery mode!

And you DO actually now have ramdisk in boot ('manually' added by Magisk)... App is just correctly stating you have no OEM ramdisk, since magisk will ALWAYS add one where absent when patching boot (just in case bootloader supports). - Lack of OEM ramdisk is an indication that boot mode Magisk probably won't work (does for Xiaomi, you) while having added ramdisk indicates nothing...

Patching boot w/ recovery mode selected may work ok for your device (it is critical to set this correctly for some devices like Samsung however, and may affect whether other partitions like vbmeta are patched or not on many devices), but app functions like reboot will probably misbehave (eg try to reboot system via recovery), and Direct Install (when Update arrives for Magisk) will try to patch / flash a recovery image...

Since fastboot boot succeeds in booting and loading the the patched image into FireOS, and magisk is granting apps root access, I'm not sure there is a reason to change things.

...unless you want to avoid unexpected behaviour...

canyie commented 2 years ago

Just in case, please confirm this build works: https://github.com/canyie/Magisk/actions/runs/2298295488 😆

finnzz commented 2 years ago

Just in case, please confirm this build works: https://github.com/canyie/Magisk/actions/runs/2298295488 laughing

Yes, I just tested it, it's still working :)

...unless you want to avoid unexpected behaviour...

I agree with you on the ramdisk. There is no stock ramdisk, and after magisk patches the boot.img there is a small ramdisk, which this Amlogic device accepts as I had initially alluded to just due to the fact that magisk init was appearing in the kernel log at all.

I'm more inclined to believe that these quirks of the suffix name and ramdisk handling are general to Amlogic devices. The Cube source appears to be heavily influenced by the source for the S922x reference board from Amlogic w400. And I was looking at the Google Chromecast (Sabrina, with Amlogic SOC) device header file today and noticed that even though it's an A/B device the 'normal' suffix value had been commented out.

Consequently, I also made a second magisk-patched boot.img with Canyie's latest build. Unchecking recovery mode, resulted in a mount/unmount loop.

[   22.122204@2] magiskinit: Skip invalid androidboot.slot_suffix=[normal]
[   22.122352@2] magiskinit: open: /proc/bootconfig failed with 2: No such file or directory
[   22.123358@2] magiskinit: fopen: /.backup/.magisk failed with 2: No such file or directory
[   22.124504@2] magiskinit: open: /.backup/init failed with 2: No such file or directory
[   22.125383@2] magiskinit: open: /.backup/.magisk failed with 2: No such file or directory
[   22.126422@2] magiskinit: linkat magisk->magisk failed with 17: File exists
[   22.199763@4] magiskinit: open: /sbin/magisk32.xz failed with 2: No such file or directory
[   22.215770@5] magiskinit: mount .magisk/selinux/load->/sys/fs/selinux/load failed with 2:y
[   22.216394@5] magiskinit: mount .magisk/selinux/enforce->/sys/fs/selinux/enforce failed wy
[   22.218313@5] magiskinit: opendir: /magisk-tmp/.magisk/mirror/cache/magisk failed with 2:y
[   22.234154@5] magiskinit: Skip invalid androidboot.slot_suffix=[normal]
[   22.234303@5] magiskinit: open: /proc/bootconfig failed with 2: No such file or directory
[   22.235309@5] magiskinit: fopen: /.backup/.magisk failed with 2: No such file or directory
[   22.236430@5] magiskinit: open: /.backup/init failed with 2: No such file or directory
[   22.237334@5] magiskinit: open: /.backup/.magisk failed with 2: No such file or directory
[   22.238371@5] magiskinit: linkat magisk->magisk failed with 17: File exists
canyie commented 2 years ago

And I was looking at the Google Chromecast (Sabrina, with Amlogic SOC) device header file today and noticed that even though it's an A/B device the 'normal' suffix value had been commented out.

You mean the androidboot.slot_suffix=normal does not exist on A/B devices? Could you upload the "device header file" for A/B devices?

pndwal commented 2 years ago

...unless you want to avoid unexpected behaviour...

I agree with you on the ramdisk. There is no stock ramdisk, and after magisk patches the boot.img there is a small ramdisk, which this Amlogic device accepts as I had initially alluded to just due to the fact that magisk init was appearing in the kernel log at all.

I'm more inclined to believe that these quirks of the suffix name and ramdisk handling are general to Amlogic devices.

May well be!... and this fix would have been well beyond me. 😝... It was just the patching of boot.img in recovery mode that caught my attention; wanted you to be aware of possible complications...

The Cube source appears to be heavily influenced by the source for the S922x reference board from Amlogic w400. And I was looking at the Google Chromecast (Sabrina, with Amlogic SOC) device header file today and noticed that even though it's an A/B device the 'normal' suffix value had been commented out.

Consequently, I also made a second magisk-patched boot.img with Canyie's latest build. Unchecking recovery mode, resulted in a mount/unmount loop...

Interesting... And great that you can boot system w/ Magisk using boot.img patched in recovery mode...

It does seem to me that there is still an issue extant preventing (proper) use of Magisk-patched boot patched in boot (normal) mode however... (NB. A-only legacy SAR Xiaomi users always patch / update w/ recovery mode deselected despite ramdisk = no.)

finnzz commented 2 years ago

You mean the androidboot.slot_suffix=normal does not exist on A/B devices? Could you upload the "device header file" for A/B devices?

Sorry it wasn't commented out. It looks like the default value for the family of g12a/b/sm1 Amlogic SOCs is 'normal'. With Sabrina it looks to see if the slot is 'normal', and if not, it updates.

"if test ${active_slot} != normal; then "\
                    "setenv bootargs ${bootargs} androidboot.slot_suffix=${active_slot};"\

I only saw part of Sabrina, but it has that statement, and I see it in all the other g12a/b/sm1 SOCs boards that I checked.

W400 reference board for s922x https://github.com/Amlogic-Lineage/u-boot/blob/khadas-vims-pie_lpddr/board/amlogic/configs/g12b_w400_v1.h

A few dozen other Amlogic devices: https://github.com/Amlogic-Lineage/u-boot/tree/khadas-vims-pie_lpddr/board/amlogic/configs

canyie commented 2 years ago

So as far as we currently know, on all A/B devices the value of androidboot.slot_suffix is the real slot? So it's safe to assume the value "normal" means A-only?

finnzz commented 2 years ago

Yes, sorry again, didn't mean to say that the slot_suffix changes. On the Amlogic g12a/b/sm1 boards (and most of the other Amlogic boards in the link above), the slot value appears to be normal, which as you said, I also think means A-only.

Normal may be the value on all the reference boards, and then vendors can change that to A/B in their config like Google did with Sabrina (Chromecast for Google TV).

I think most of the Amlogic devices running android are TV boxes, maybe that's why this hasn't come up earlier? Fewer TV boxes, and they are less commonly rooted.

canyie commented 2 years ago

Hmmm, I'm trying to find a better workaround instead of hardcode "normal", but I can't :( Official document says: A/B devices don't need recovery partition; slot should have prefix "_"; should have system_a system_b and should not have real partition named "system"… but they are not "must". The A/B-ed partitions must have slotselect argument in fstab, but we don't want to parse fstab again :( Anyway, whitelist "normal" can guarantee no regressions will occur because the length of config->slot is only 3. 😀