topjohnwu / Magisk

The Magic Mask for Android
GNU General Public License v3.0
47.67k stars 12.1k forks source link

Nothing Phone (1) OS 1.5.3 freezes and ramdumps during boot after boot.img is patched #6780

Closed LukeSkyD closed 1 year ago

LukeSkyD commented 1 year ago

With unlocked bootloader and stock boot the phone works ok; with a patched boot img freezes and/or ramdumps during use are common. Sometimes they happen right after boot, other times they can take a while. This happens with and without modules installed, with or without zygisk as users have reported. In the crashLog file i encountered a freeze, no ramdump.

Device: Nothing Phone (1) Android version: 13 Nothing OS Version: 1.5.3 and 1.5.3 HOTFIX are affected, both EEA and Global variant of the phone, <= 1.5.2 versions work fine Magisk version name: 25.2 but also d251c288 Magisk version code: 25205, 25210

crashLog.txt

Photo of a user reported ramdump

EDIT: Here are some boot images: 1.5.2 stock 1.5.2 magisk (they are both working)

1.5.3 stock 1.5.3 magisk

1.5.3.hotfix stock 1.5.3.hotfix magisk (magisk patched boot freezes)

EDIT 2: In my naivity i thought it was a problem with the ramdisk, so i packed a custom boot img with 1.5.2 magisk patched ramdisk and 1.5.3.hotfix boot and kernel, still crashed. I've noticed the phone crashing a lot more with app like discord opened in background. Also discord disconnects my client easily if i multitask opening other apps.

Using the phone slowly seems to mitigate the problem, like it seems the phone boots things before it actually loaded them correctly. Example: the phone crashes a lot faster if at boot i immediately unlock, open discord and join a voice server with a bluetooth headset, exit while still connected and open like a magazine app.

I don't get crashes only at apps' startup, even during normal use, like yt in background with some music at a certain time the music stops like the buffering but without the loading circle, the phone is still usable for 2 seconds and then freeze.

I checked the ram usage and it never goes over 3.5GB or very rarely so it doesn't seem like a memory leak is the cause.

LukeSkyD commented 1 year ago

how are you adding script? personally made magisk module (maybe that's the issue)

I created an .sh script on pc, pushed to the phone and then with termux copied it to /data/adb/post-fs-data.d chmod +x scriptname and rebooted

Then i tried doing a magisk module which didn't set the persist.sys.mglru_enable property to false and now even with the older script resetprops does not set anything

i also tried it with the verbose option but it's not outputting anything. Magisk shows it executing tho

UPDATE: i tried rebooting into bootloader and then into system again, now with only the script in post-fs-data.d i again have the property set to false.

Niecks90 commented 1 year ago

how are you adding script? personally made magisk module (maybe that's the issue)

I created an .sh script on pc, pushed to the phone and then with termux copied it to /data/adb/post-fs-data.d chmod +x scriptname and rebooted

Then i tried doing a magisk module which didn't set the persist.sys.mglru_enable property to false and now even with the older script resetprops does not set anything

i also tried it with the verbose option but it's not outputting anything. Magisk shows it executing tho

UPDATE: i tried rebooting into bootloader and then into system again, now with only the script in post-fs-data.d i again have the property set to false.

You don't have freeze anymore with this workaround? Can you share your script pls?

LukeSkyD commented 1 year ago

how are you adding script? personally made magisk module (maybe that's the issue)

I created an .sh script on pc, pushed to the phone and then with termux copied it to /data/adb/post-fs-data.d chmod +x scriptname and rebooted Then i tried doing a magisk module which didn't set the persist.sys.mglru_enable property to false and now even with the older script resetprops does not set anything i also tried it with the verbose option but it's not outputting anything. Magisk shows it executing tho UPDATE: i tried rebooting into bootloader and then into system again, now with only the script in post-fs-data.d i again have the property set to false.

You don't have freeze anymore with this workaround? Can you share your script pls?

Here's my Magisk module, it's working (at least for me). Follow the guide in the readme

https://github.com/LukeSkyD/NP1-MGLRU-FIX

gwolf2u commented 1 year ago

how are you adding script? personally made magisk module (maybe that's the issue)

I created an .sh script on pc, pushed to the phone and then with termux copied it to /data/adb/post-fs-data.d chmod +x scriptname and rebooted Then i tried doing a magisk module which didn't set the persist.sys.mglru_enable property to false and now even with the older script resetprops does not set anything i also tried it with the verbose option but it's not outputting anything. Magisk shows it executing tho UPDATE: i tried rebooting into bootloader and then into system again, now with only the script in post-fs-data.d i again have the property set to false.

You don't have freeze anymore with this workaround? Can you share your script pls?

Here's my Magisk module, it's working (at least for me). Follow the guide in the readme

https://github.com/LukeSkyD/NP1-MGLRU-FIX

your module acts exactly as mine rebooting cat /sys/kernel/mm/lru_gen/enabled -> 0x0000 entered pin / fingerprint cat /sys/kernel/mm/lru_gen/enabled -> 0x0001 for split second then 0x0003

so for me doesn't work however, entering command resetprop persist.sys.mglru_enable false seems to make system run fine without issues so as far as I can tell @aviraxp was right

now all rest is for magisk "experts" to find a good way to fix this

LukeSkyD commented 1 year ago

how are you adding script? personally made magisk module (maybe that's the issue)

I created an .sh script on pc, pushed to the phone and then with termux copied it to /data/adb/post-fs-data.d chmod +x scriptname and rebooted Then i tried doing a magisk module which didn't set the persist.sys.mglru_enable property to false and now even with the older script resetprops does not set anything i also tried it with the verbose option but it's not outputting anything. Magisk shows it executing tho UPDATE: i tried rebooting into bootloader and then into system again, now with only the script in post-fs-data.d i again have the property set to false.

You don't have freeze anymore with this workaround? Can you share your script pls?

Here's my Magisk module, it's working (at least for me). Follow the guide in the readme https://github.com/LukeSkyD/NP1-MGLRU-FIX

your module acts exactly as mine rebooting cat /sys/kernel/mm/lru_gen/enabled -> 0x0000 entered pin / fingerprint cat /sys/kernel/mm/lru_gen/enabled -> 0x0001 for split second then 0x0003

so for me doesn't work however, entering command resetprop persist.sys.mglru_enable false seems to make system run fine without issues so as far as I can tell @aviraxp was right

now all rest is for magisk "experts" to find a good way to fix this

We are gettin off topic with this but mine stays at zero, try to check for other modules or settings in your phone.

Anyway if freezes or crashes are encountered i will update this thread.

If 24h passes without issue i'll also let you know

Nixola commented 1 year ago

Hi, I just wanted to mention disabling mglru seems to work on my phone, though the Magisk module doesn't (same issue as gwolf2u). It's only been a few hours without crashes, but since the longest I've went with Magisk on the latest firmware was about 5 minutes before having to force a reboot, I'd at the very least call it an improvement.

LukeSkyD commented 1 year ago

Hi, I just wanted to mention disabling mglru seems to work on my phone, though the Magisk module doesn't (same issue as gwolf2u). It's only been a few hours without crashes, but since the longest I've went with Magisk on the latest firmware was about 5 minutes before having to force a reboot, I'd at the very least call it an improvement.

Try this new version if it keeps changing maybe magisk is logging something about it

https://github.com/LukeSkyD/NP1-MGLRU-FIX/releases/tag/V2

Remember to perform a cold boot

Still no crashes for now, I've rebooted the phone one time for the module's update, seems like MGLRU is the culprit.

aviraxp commented 1 year ago

Multiple devices and kernel have ported mglru but unlike np1, most of them are 5.10 kernel devices. Not sure if Nothing ported it right, or it is a real compality issue.

I would suspect it is actually related to some weird cgroup issue as magiskd switched cgroup to prevent itself from being killed.

LukeSkyD commented 1 year ago

Multiple devices and kernel have ported mglru but unlike np1, most of them are 5.10 kernel devices. Not sure if Nothing ported it right, or it is a real compality issue.

I would suspect it is actually related to some weird cgroup issue as magiskd switched cgroup to prevent itself from being killed.

uname -r returns 5.4.197-qgki-g2efe4411886f

/sys/kernel/debug does not exists tho, is there another way to find any useful log about mglru or cgroup?

yujincheng08 commented 1 year ago

Based on the current investigation, I wonder if there's something Magisk should do about it. For NP users, maybe a custom module for this is more helpful. 🤔

LukeSkyD commented 1 year ago

So far only two people out of around 50 reported freezes with mglru disabled One was running an emulator app and got multiple freezes but i don't expected a system with a key feature disabled to be 100% stable in all environments. The other opened the camera and froze.

The module only mitigates the problem.

So Nothing's implementation of mglru is the problem. But i don't know if its a problem of the boot img or magisk not handling memory correctly Should i test something else?

aviraxp commented 1 year ago

Multiple custom kernels with MGLRU does not have issues (mostly 5.10+ kernel). So I will suggest that Nothing did sth wrong. Actually Nothing gets patches from a Google port of MGLRU (https://android-review.googlesource.com/c/kernel/common/+/2324610/5). MGLRU is new and many fixes are not ported in this chain.

Bert-Proesmans commented 1 year ago

I installed a magisk module to disable the lru feature yesterday. I'm 100% sure getprop persist.sys.mglru_enable said false, and /sys/kernel/mm/lru_gen/enabled said 0, after pin-entry. This morning my phone froze after 20 minutes of Android Auto usage, and I rebooted. I check just now and getprop persist.sys.mglru_enable says true. No new freezes since this morning though, and I used Android Auto again for ~25 minutes.

LukeSkyD's module is installed at this moment and I'll triple check if mglru stays disabled tonight.

gwolf2u commented 1 year ago

there seems to be some trigger in the system that sets the prop to true again resulting in a crash I've made a different magisk module that checkes for prop status every 30 seconds (because sometimes the crash occurs after a minute, so let's be safe) and resets back to false feel free to use it guys if interested https://github.com/gwolf2u/MGLRU-Disabler/releases/tag/4.0

aviraxp commented 1 year ago

there seems to be some trigger in the system that sets the prop to true again resulting in a crash I've made a different magisk module that checkes for prop status every 30 seconds (because sometimes the crash occurs after a minute, so let's be safe) and resets back to false feel free to use it guys if interested https://github.com/gwolf2u/MGLRU-Disabler/releases/tag/4.0

How about chown/chmod that 'enabled' node...

yujincheng08 commented 1 year ago

1.5.4 should have fixed this issue.

LukeSkyD commented 1 year ago

Lru_gen/enabled is set to 1, not 3 anymore

I've been using .4 for around 18h and no crashes or freezes so far, no crashes reported by the community either