topjohnwu / Magisk

The Magic Mask for Android
GNU General Public License v3.0
48.28k stars 12.33k forks source link

Android cannot boot if a patched boot.img is flashed with a full factory image #2214

Closed ubergeek77 closed 4 years ago

ubergeek77 commented 4 years ago

Device: Google Pixel 3 XL (crosshatch) Android version: 10 Magisk Version: v20.2 (20200)

To keep things brief, I would like to flash Magisk as part of a full factory image flash. I have (what I believe is) a very good reason for doing so, but there seems to be an issue with how the patched boot.img behaves during a typical flash.

Basically, when you send a Magisk-patched boot.img as part of a factory image flash (via flash-all.sh present in a full factory image), the flash will succeed, but the device will not be able to boot after that. It will reach the "G" logo on a stock Google image (or the "android" logo on an AOSP build), then abruptly reboot, try the next boot slot, then ultimately fail and go back to the bootloader. The bootloader won't say anything surrounding having issues with "boot.img," it will just say "no valid slot to boot." Since you can actually reach the "G" logo, I don't think anything is necessarily "wrong" with the patched boot.img, but it seems the rest of the install doesn't complete successfully when it is in place.

If I were to flash an unpatched factory image, boot it once, and then boot the exact same boot.img I attempted to embed in the factory image, it will work just fine, and Magisk will be active.

Steps to reproduce (using Google factory image as an example):

# Download a full factory image
wget https://dl.google.com/dl/android/aosp/crosshatch-qq1a.191205.008-factory-ff62c022.zip

# Extract the image
unzip crosshatch-qq1a.191205.008-factory-ff62c022.zip
cd crosshatch-qq1a.191205.008

# Extract boot.img
unzip image-crosshatch-qq1a.191205.008.zip boot.img

# Patch the boot.img
# (Do this manually, save to patched_boot.img)

# Overwrite the boot.img in the factory image
mv boot.img boot.img.orig
cp patched_boot.img boot.img
zip -rv image-crosshatch-qq1a.191205.008.zip boot.img

# Flash the image
./flash-all.sh

# Result: The image will successfully flash, but fail to boot

# Verify that rezipping the image did not cause the issue:
mv boot.img.orig boot.img
zip -rv image-crosshatch-qq1a.191205.008.zip boot.img
./flash-all.sh

# Result: The image will flash, and the device will boot without Magisk

# Verify that patched_boot.img is sane:
# (Reboot the device to the bootloader)

fastboot boot patched_boot.img

# Result: The same patched boot image as before, that caused a boot failure
# when installed with a full factory image, will boot, and Magisk will be active

What's interesting to me is that, even with a Magisk-patched boot.img, the flash process is still able to boot into userspace fastboot (new with Android 10), and the rest of the flash seems to go by totally fine.

I know that Magisk has a very strict development process, but being able to do this makes many security benefits possible (including signing a patched boot.img with a custom AVB key, patching recovery so that a factory reset in the event of theft is not possible, and locking the bootloader while retaining Magisk). I really would appreciate any solution to this flashing issue (for my sanity at least, I've spent a few weeks to get to this point, and I could use a win; it would really suck to be shut down at this point).

I should note that this was all possible on Android 9. There must be something about the flashing process for an Android 10 ROM that doesn't seem to like a patched boot.img, but I can't figure out what that is.

I will happily provide any and all feedback, be it gathering logs, testing workarounds, providing further explanations, or anything else.

Thank you for your time.

topjohnwu commented 4 years ago

@ubergeek77 this is caused by the fact that the boot control HAL will notify boot success when everything is properly verified. This same issue happens with Magisk's OTA feature, which in that case Magisk Manager will download a prebuilt bootctl binary and forcefully flag the slot as valid, because if we don't do so, boot control will NOT validify this slot, and eventually boot to the other slot. Unfortunately, it is not possible when you wipe your data (like when you flash factory image). The reason why you are slammed into bootloader instead of booting into another slot is because by default factory images will flash pre-optimized oat files to the other slot, making the other slot unbootable

ubergeek77 commented 4 years ago

Thanks for the explanation @topjohnwu, I appreciate it. I'd be lying if I said I understand the full extent of what's going on here, especially since this still happens even if I have a signed boot.img that matches what's in vbmeta.img, but it is helpful nonetheless. It may be worth mentioning that I haven't tested setting my own custom AVB key via fastboot (which is possible on Pixel devices). That key would match what boot.img is signed with, but since my bootloader is in a locked state, that probably doesn't matter in this situation.

Another thing that confuses me is that, even if I were to manually flash boot.img after a failed flash, and try to go into recovery mode or rescue mode, it will actually fail to boot boot.img, meaning I run the risk of bricking my phone if I'm not careful and there's no other way to flash a good OTA. I can (probably) prevent this if I remove the option to factory reset from recovery, but this could still put me in hot water.

If I'm building AOSP from source (which is the end goal here), if I wanted to work around the issue of the boot slots not being verified on a data wipe, is there something I can do that would force the slot to be flagged as valid? If at all possible, even if I'm limited to only flashing OTA images in a setup like this, I'd really like to prevent something as innocuous as a factory reset from putting my phone out of commission.

topjohnwu commented 4 years ago

I'm not that familiar with the boot control HAL, which is why I just simply force calling this command on boot after OTA (bootctl mark-boot-successful).

If you ever got refused to boot, you can simply just reflash the factory image (data wipe is not required), that seems to reset the boot slot status and the bootloader will not refuse to boot anymore.

ubergeek77 commented 4 years ago

I've just tried this with the Google image. I first flashed an unmodified Google image, rebooted, flashed the modified Google image, failed to boot, then tried flashing the modified Google image again, omitting the -w wipe flag. Unfortunately, that didn't work. fastboot set_active a (or b) didn't seem to work, either.

I think I'll give up on trying to bake this into factory images, and try achieving a locked bootloader some other way.

This is somewhat off-topic, but since it's tangentally related, I'd like to ask you one final thing:

All of my efforts with this have been to ultimately run a RattlesnakeOS ROM with Magisk included. RattlesnakeOS supports installing OTA images in the background, exactly like a normal manufacturer might do OTAs, only the resulting OTAs have the modifications I've defined in my RattlesnakeOS-stack config. That is to say, I'll theoretically have an OTA installed in the background, and that OTA will already have Magisk included in the boot.img, and that boot.img will be signed.

If yes:

Apologies for the onslaught of questions here. I'm just really interested in achieving Magisk with a locked bootloader, and there's near-zero discussion about that on the Internet that I've been able to find, so I was hoping to figure it out one way or the other.

enovella commented 4 years ago
If you ever got refused to boot, you can simply just reflash the factory image (data wipe is not required), that seems to reset the boot slot status and the bootloader will not refuse to boot anymore.

This did work on my end <3. The testing device was a Pixel 3a running Android 10 June 2020 security patches.

...
fastboot update -w image-sargo-qq3a.200605.002.zip --skip-reboot
fastboot reboot-bootloader
sleep 5
fastboot update image-sargo-qq3a.200605.002.zip --skip-reboot
fastboot getvar all

Right after installation and before rebooting:

[00:38 edu@xps bootpatcher]  >  fastboot flash boot new-boot.img --slot all 
target reported max download size of 536870912 bytes
sending 'boot_a' (30396 KB)...
OKAY [  1.004s]
writing 'boot_a'...
OKAY [  0.231s]
sending 'boot_b' (30396 KB)...
OKAY [  0.976s]
writing 'boot_b'...
OKAY [  0.177s]
finished. total time: 2.389s

UPDATE: The boot image is flashed but the OS doesn't boot and after 3 reboots the device enters bootloader mode with reason: no valid slot to boot

philipfong commented 3 years ago

I know this issue was closed but I was interested in this as well. @ubergeek77 do you happen to know if flashing a factory image, followed immediately by going back into the bootloader, and then flashing the modified boot.img works? To be clear I am not using the -w flag.

I ask because upon flashing a stock factory image and booting up, Google Pay invalidates my payment methods which require me to wipe some app data and re-add cards.

It would make things more seamless if I managed to flash a magisk-patched image without needing to boot using the stock boot.img