topjohnwu / Magisk

The Magic Mask for Android
GNU General Public License v3.0
47.66k stars 12.1k forks source link

Nothing Phone (1) OS 1.5.3 freezes and ramdumps during boot after boot.img is patched #6780

Closed LukeSkyD closed 1 year ago

LukeSkyD commented 1 year ago

With unlocked bootloader and stock boot the phone works ok; with a patched boot img freezes and/or ramdumps during use are common. Sometimes they happen right after boot, other times they can take a while. This happens with and without modules installed, with or without zygisk as users have reported. In the crashLog file i encountered a freeze, no ramdump.

Device: Nothing Phone (1) Android version: 13 Nothing OS Version: 1.5.3 and 1.5.3 HOTFIX are affected, both EEA and Global variant of the phone, <= 1.5.2 versions work fine Magisk version name: 25.2 but also d251c288 Magisk version code: 25205, 25210

crashLog.txt

Photo of a user reported ramdump

EDIT: Here are some boot images: 1.5.2 stock 1.5.2 magisk (they are both working)

1.5.3 stock 1.5.3 magisk

1.5.3.hotfix stock 1.5.3.hotfix magisk (magisk patched boot freezes)

EDIT 2: In my naivity i thought it was a problem with the ramdisk, so i packed a custom boot img with 1.5.2 magisk patched ramdisk and 1.5.3.hotfix boot and kernel, still crashed. I've noticed the phone crashing a lot more with app like discord opened in background. Also discord disconnects my client easily if i multitask opening other apps.

Using the phone slowly seems to mitigate the problem, like it seems the phone boots things before it actually loaded them correctly. Example: the phone crashes a lot faster if at boot i immediately unlock, open discord and join a voice server with a bluetooth headset, exit while still connected and open like a magazine app.

I don't get crashes only at apps' startup, even during normal use, like yt in background with some music at a certain time the music stops like the buffering but without the loading circle, the phone is still usable for 2 seconds and then freeze.

I checked the ram usage and it never goes over 3.5GB or very rarely so it doesn't seem like a memory leak is the cause.

Whatsek commented 1 year ago

I have the same problem, on my nothing phone 1 after the latest 1.5.3 update. Unlocked bootloader, Magisk with 3 modules. I amusing this config on this phone from the beginning, without any problems, so it has something to do with the 1.5.3 update.

pannal commented 1 year ago

Same here, restored original boot imgs, now the phone is stable again. Sadly there's no way to downgrade to 1.5.2 AFAIK.

yujincheng08 commented 1 year ago

maybe you trigger a crash without zygisk and any modules and share the bugreport? Cannot see anything useful from the log tho.

yujincheng08 commented 1 year ago

BTW, can you adb when the phone freezes? If so, you'd better capture bugreport during freeze.

pannal commented 1 year ago

Just tried that. ADB still works when the whole system is frozen - I managed to start capturing a bugreport, but it stuck at 0%. ADB shell was still possible afterwards, but then the dumpstatez service seems to have crashed:

C:\WINDOWS\system32>adb bugreport F:\__ANDROID\nothing1\bugreports
^C 0%] generating bugreport-SpacewarEEA-TKQ1.220915.002-2023-03-23-14-05-57.zip
C:\WINDOWS\system32>adb devices
List of devices attached
P2126A001028    device

C:\WINDOWS\system32>adb shell
Spacewar:/ $ ls
acct        config         dev              lost+found  oem          second_stage_resources  system_ext
apex        d              etc              metadata    postinstall  storage                 vendor
bin         data           init             mnt         proc         sys                     vendor_dlkm
bugreports  data_mirror    init.environ.rc  odm         product      system
cache       debug_ramdisk  linkerconfig     odm_dlkm    sdcard       system_dlkm
Spacewar:/ $ exit

C:\WINDOWS\system32>adb bugreport F:\__ANDROID\nothing1\bugreports
adb: device failed to take a zipped bugreport: Failed to connect to dumpstatez service: Connection refused

C:\WINDOWS\system32>adb shell
Spacewar:/ $ exit

C:\WINDOWS\system32>adb bugreport F:\__ANDROID\nothing1\bugreports
adb: device failed to take a zipped bugreport: Failed to connect to dumpstatez service: Connection refused

Once a magisk-patched boot img is used, any process that uses a lot of resources seems to crash almost immediately - SystemUI with it. I've triggered the freeze by using ReVanced to patch a YT apk - it froze after about 5 seconds, UI completely stuck and unusable (the above is from that time).

Edit: I'll try once more, but not much hope over here. Edit 2: No dice. This was with zygisk on and modules active. I'll try with everything off.

Edit 3: With zygisk and modules off, it seems to take a little longer to crash, but still no dice with the bugreport, as the service crashed at 8% (phone freeze):

C:\WINDOWS\system32>adb bugreport F:\__ANDROID\nothing1\bugreports
[  8%] generating bugreport-SpacewarEEA-TKQ1.220915.002-2023-03-23-14-21-51.zip

Edit 4: All those freezes seem to have corrupted my partition slot A (doesn't boot anymore); B works, but I'm not inclined to continue and risk a softbrick tbh.

LukeSkyD commented 1 year ago

Just triggered a crash without zygisk and modules but im at work, as soon as I'm home i'll try to upload an android crash report

To cause a crash Two phases, divided by a reboot: Collect a bluetooth headset to the phone Open discord and connect to a voice channel

REBOOT THE PHONE

As soon as it boots, unlock it, open discord so that it automatically reconnects Now start opening apps but the phone should crash almost immediatly.

I then rebooted, went to developer option to get a complete bug report.

There could be sensitive data, is there a way i can send it to you privately?

CrashLog.txt was produced by logcat through adb.

EDIT: Producing a log while the phone crashes is impossible because it becomes irresponsive to anything.

dxdvostfr commented 1 year ago

Same problem here.

LukeSkyD commented 1 year ago

On my side I have the bug report from the 1st freeze, the one I've talked before, no zygisk but modules (busybox, systemless host and revanced) on: bugreport-Spacewar-TKQ1.220915.002-2023-03-23-14-30-22.zip

2nd freeze, 00:14 cest 24/03, no zygisk, no modules, crashreport with adb logcat and bugreport with android bugreport-Spacewar-TKQ1.220915.002-2023-03-24-00-13-55.zip crashLog2.txt

Also a user on Telegram shared their logcat which they talked about it having irregularities 2023-03-23-21-39-24.txt

"This is a screenshot of the exact error which I found suspicious"

What I've noticed is that without zygisk the system moves from using around 3.5GB of ram to 4.1 average. The phone crashes less frequently and more apps started at boot (immediatly after first unlocking the phone) are needed to crash it

EDIT: does magisk change the permission of something/a process in the system?

canyie commented 1 year ago

Since adb is accessible when problem happens, can you get me some logs by adb shell su -c dmesg > kernel.log and adb shell su -c getprop > prop.log when it freezes?

HE7086 commented 1 year ago

I have the exact same problem on 1.5.3 with hotfix, and logcat looks exactly the same, so here are my logs attached (without zygisk):

kernel.log prop.log

The way I trigger this is just randomly open many apps, as the discord method mentioned previously didn't work well for me.

Also I think this is related to memory, since memory usage during freeze is unusually high (normally it's around 4GiB used)

❯ adb shell    
Spacewar:/ $ free -h
                total        used        free      shared     buffers
Mem:              11G         11G        249M         74M        1.0M
-/+ buffers/cache:            11G        250M
Swap:            4.0G        1.1G        2.8G
Spacewar:/ $

It is worth noting that when freezing the device sometimes showed up as offline in adb, and the only way to interact with it is to long press the power button to reboot.

Hope these help.

LukeSkyD commented 1 year ago

Since adb is accessible when problem happens, can you get me some logs by adb shell su -c dmesg > kernel.log and adb shell su -c getprop > prop.log when it freezes?

Even if the device is in the adb devices list, the logs created are empty. The phone doesn't answer back, it's completely frozen.

EDIT: The phone lags with magisk but lags more before crashing logcatting this last phase does not produce anything because as soon as a log command is sent with adb the phone freezes.

The phone just rebooted on its own after freezing while locked It's rare but maybe that can log more? i'll try to logcat

pannal commented 1 year ago

I have the exact same problem on 1.5.3 with hotfix, and logcat looks exactly the same, so here are my logs attached (without zygisk):

kernel.log prop.log

The way I trigger this is just randomly open many apps, as the discord method mentioned previously didn't work well for me.

Also I think this is related to memory, since memory usage during freeze is unusually high (normally it's around 4GiB used)

❯ adb shell    
Spacewar:/ $ free -h
                total        used        free      shared     buffers
Mem:              11G         11G        249M         74M        1.0M
-/+ buffers/cache:            11G        250M
Swap:            4.0G        1.1G        2.8G
Spacewar:/ $

It is worth noting that when freezing the device sometimes showed up as offline in adb, and the only way to interact with it is to long press the power button to reboot.

Hope these help.

In my case the freeze is always the whole system. Don't know why, but my apps don't crash - the whole phone freezes when it does.

HE7086 commented 1 year ago

In my case the freeze is always the whole system. Don't know why, but my apps don't crash - the whole phone freezes when it does.

Yeah that's also my case, I suspect that the systemui got stuck because of OOM. Tried to restart the systemui from adb but it also stuck.

LukeSkyD commented 1 year ago

Since adb is accessible when problem happens, can you get me some logs by adb shell su -c dmesg > kernel.log and adb shell su -c getprop > prop.log when it freezes?

Nothing When the phone freezes the phone outputs nothing I've only have a logcat produced after the phone rebooted itself because before it is not seen.

out2.log

I've tried for three days to get any log but now i need a working phone and to use banking apps. There are crashes and freezes but no response when adb commands are sent

It's much easier to get a logcat at freeze with zygisk and the log file is filled with super critical pressure event polling check, the "suspicious" error in the screenshot a user provided in one of my previous messages.

SelfRef commented 1 year ago

Two interesting situations I had were when I used Spotify and device froze but music still played via bluetooth speaker and I could change songs via remote device so both bluetooth stack and app still worked.

The second situation was while using camera app - the whole system froze but live preview in camera app was working - this was the only time app was showing some kind of UI update after freeze.

Niecks90 commented 1 year ago

Same issue here, version 1.5.3 fresh installed with patched boot.

Multiples freezes (on Camera app, etc...), hard reboot is the only way to get phone works back.

EDIT : as noticed before I have the same issue with last Magisk Canary version

Niecks90 commented 1 year ago

There is this in changelog of 1.5.3 maybe it's the cause of the issue New memory management algorithm that reduces app restart times by over 35% and lowers CPU consumption to improve overall battery life. Improved system stability.

amit548 commented 1 year ago

I have facing the same problem and I downgrade the OS 1.5.3 to 1.5.2

aleqz commented 1 year ago

I have facing the same problem and I downgrade the OS 1.5.3 to 1.5.2

Did you lose all your data? What was the the tutorial you followed? Thanks

pannal commented 1 year ago

There is this in changelog of 1.5.3 maybe it's the cause of the issue New memory management algorithm that reduces app restart times by over 35% and lowers CPU consumption to improve overall battery life. Improved system stability.

That would make a lot of sense IMHO. Most logs posted here had some kind of OOM issues.

Is anyone actively looking into this ticket? All up-to-date Nothing Phone 1's are affected.

nomtix commented 1 year ago

I have facing the same problem and I downgrade the OS 1.5.3 to 1.5.2

Did you lose all your data? What was the the tutorial you followed? Thanks

You have to flash 1.1.7 stock firmware via fastboot comands and then simply update to 1.5.2 with ota updates. And yes you lose all your personal data, simply make a back up.

TheNotOnly commented 1 year ago

There is this in changelog of 1.5.3 maybe it's the cause of the issue New memory management algorithm that reduces app restart times by over 35% and lowers CPU consumption to improve overall battery life. Improved system stability.

That would make a lot of sense IMHO. Most logs posted here had some kind of OOM issues.

Is anyone actively looking into this ticket? All up-to-date Nothing Phone 1's are affected.

Might be a memory leak. I remember seeing the ram full and logcat showing lots of processes restarting over and over until the ramdump occurs.

amit548 commented 1 year ago

I have facing the same problem and I downgrade the OS 1.5.3 to 1.5.2

Did you lose all your data? What was the the tutorial you followed? Thanks

I just use fastboot to flash full 1.1.7 version

vvb2060 commented 1 year ago

If anyone finds the reason, please open a new issue.

pannal commented 1 year ago

If anyone finds the reason, please open a new issue.

Wait, how does closing this issue help here? What info do you need? We've tried to comply with the log requests as well as possible. If it's not enough, please point us in the right direction.

The only thing that'll happen is that people who have the same issue (everyone on Nothing Phone 1 with current firmware who tries to root) comes and creates a new duplicate ticket.

dxdvostfr commented 1 year ago

If anyone finds the reason, please open a new issue.

Wait, how does closing this issue help here? What info do you need? We've tried to comply with the log requests as well as possible. If it's not enough, please point us in the right direction.

The only thing that'll happen is that people who have the same issue (everyone on Nothing Phone 1 who tries to root) comes and creates a new duplicate ticket.

Yeah right

vvb2060 commented 1 year ago

Sorry, we are not in a position to resolve this issue. Not even sure how it relates to Magisk.

pannal commented 1 year ago

And the logs don't show anything useful?

Not even sure how it relates to Magisk.

Well, the phone crashes when Magisk is used with firmware 1.5.3. It doesn't, when Magisk isn't installed. That's the only relation there is, really.

dxdvostfr commented 1 year ago

Sorry, we are not in a position to resolve this issue. Not even sure how it relates to Magisk.

Well, magisk is supposed to patch the boot image of any phone to allow root. But on the nothing phone it renders it unusable. Even if the issue would come from the nothing os in itself, I think you guys should investigate on this "compatibility" issue because magisk is supposed to be somewhat universal

LukeSkyD commented 1 year ago

Sorry, we are not in a position to resolve this issue. Not even sure how it relates to Magisk.

Magisk is patching the boot image in a way that triggers problems in the os It is not know if it is a problem caused by nothing os devs or magisk wrongly patching a new or different implementation of the boot image Or the os checking for modifications and crashing.

It is not know if the new NP's ram implementation will be an isolated situation or will have ripercussions in the future.

It's not a problem that appears during normal use with a stock image.

It affects people with JUST the magisk patched boot img, NO MODULES, NO ZYGISK, NO ANYTHING. Im available for further testing, everyone is but no help has come from the devs.

There is a compatibility issue between magisk and the phone, the cause has not been found, you cannot excluse magisk from the possible causes.

canyie commented 1 year ago

Can someone try this build? app-debug.zip

I've removed init.rc patching from source code, so you will see Magisk NOT INSTALLED. Just test if this bug happens. Let's see which part of magisk triggers this weird bug!

Whatsek commented 1 year ago

Can someone try this build? app-debug.zip

I've removed init.rc patching from source code, so you will see Magisk NOT INSTALLED. Just test if this bug happens. Let's see which part of magisk triggers this weird bug!

I can try this in half an hour: So I boot with a patched 1.5.3, install this version, en then do a direct install and reboot. Correct?

gwolf2u commented 1 year ago

Can someone try this build? app-debug.zip I've removed init.rc patching from source code, so you will see Magisk NOT INSTALLED. Just test if this bug happens. Let's see which part of magisk triggers this weird bug!

I can try this in half an hour: So I boot with a patched 1.5.3, install this version, en then do a direct install and reboot. Correct?

install the apk from inside zip patch stock boot.img and boot with it then patch boot withing magisk (as procedure)

LukeSkyD commented 1 year ago

booting the newly patched boot img does not give me the option to directly install in magisk, so i've installed the newer apk, but booted a previously patched boot img, then opened magisk and did direct install.

Now magisk shows as not installed but it should have the modified boot img

PS: thank you for reopening the issue

ShiiroSan commented 1 year ago

Can someone try this build? app-debug.zip

I've removed init.rc patching from source code, so you will see Magisk NOT INSTALLED. Just test if this bug happens. Let's see which part of magisk triggers this weird bug!

Not able to produce any logs requiring su. I guess it's because init.rc is removed (?). Even tho, I tried many demanding ram things (playing games, patching things via revanced manager, opening numerous apps) and no freeze so far.

Whatsek commented 1 year ago

Can someone try this build? app-debug.zip I've removed init.rc patching from source code, so you will see Magisk NOT INSTALLED. Just test if this bug happens. Let's see which part of magisk triggers this weird bug!

I can try this in half an hour: So I boot with a patched 1.5.3, install this version, en then do a direct install and reboot. Correct?

install the apk from inside zip patch stock boot.img and boot with it then patch boot withing magisk (as procedure)

First two steps succeed, third step cant be done, because magisk isn`t installed, so only "choose a file" method is availble. Will keep using the phone with this boot.img now.

canyie commented 1 year ago

No one reports crashing on my build, so it seems neither magiskboot nor magiskinit causes the issue

app-debug.zip And how about this? Install the apk inside zip, open app, patch boot image, flash newly patched image via fastboot, then reboot. You will still see Magisk NOT INSTALLED on this build

canyie commented 1 year ago

After removing mount_mirror and load_modules, the device works fine for several hours. It seems this issue is triggered by magic mounting. Maybe something like #3171

LukeSkyD commented 1 year ago

After removing mount_mirror and load_modules, the device works fine for several hours. It seems this issue is triggered by magic mounting. Maybe something like #3171

No freezes with the first build posted, should i try the second one even if the comment is hidden?

canyie commented 1 year ago

After removing mount_mirror and load_modules, the device works fine for several hours. It seems this issue is triggered by magic mounting. Maybe something like #3171

No freezes with the first build posted, should i try the second one even if the comment is hidden?

No need, I have privately asked a Nothing Phone user to test and he have confirmed that the build also randomly freezes. I guess one of the following commit causes it: https://github.com/NothingOSS/android_kernel_msm-5.4_nothing_sm7325/commit/9278b64d56c7fe1c33768c3ecdface56201f785e https://github.com/NothingOSS/android_kernel_msm-5.4_nothing_sm7325/commit/ed56a0636bd0e3a3def144daab31f020f7a44558

aviraxp commented 1 year ago

After investigation (actually assumption), it could be related to Nothing's backport of multi-gen LRU feature in kernel. Can you please run following command in rooted terminal: resetprop persist.sys.mglru_enable false Reboot and check if issue persists.

Multi-gen LRU will roll out to mainstream devices on Android 14.

LukeSkyD commented 1 year ago

After investigation (actually assumption), it could be related to Nothing's backport of multi-gen LRU feature in kernel. Can you please run following command in rooted terminal: resetprop persist.sys.mglru_enable false Reboot and check if issue persists.

Multi-gen LRU will roll out to mainstream devices on Android 14.

My phone just crashed... But seems to be more stable even with zygisk enabled

aviraxp commented 1 year ago

After investigation (actually assumption), it could be related to Nothing's backport of multi-gen LRU feature in kernel. Can you please run following command in rooted terminal: resetprop persist.sys.mglru_enable false Reboot and check if issue persists. Multi-gen LRU will roll out to mainstream devices on Android 14.

My phone just crashed... But seems to be more stable even with zygisk enabled

what's the output of cat /sys/kernel/mm/lru_gen/enabled?

LukeSkyD commented 1 year ago

cat /sys/kernel/mm/lru_gen/enabled

0x0003

aviraxp commented 1 year ago

cat /sys/kernel/mm/lru_gen/enabled

0x0003

You didn't disable MGLRU then. Could make a script to resetprop 'resetprop persist.sys.mglru_enable false' in magisk.

gwolf2u commented 1 year ago

after reboot resets to 0X0003 magisk module needed for sure testing myself now also

LukeSkyD commented 1 year ago

cat /sys/kernel/mm/lru_gen/enabled

0x0003

You didn't disable MGLRU then. Could make a script to resetprop 'resetprop persist.sys.mglru_enable false' in magisk.

it does not persist after reboot, having a script in post-fs-data.d correctly sets it to 0.

I'll try and let you know.

gwolf2u commented 1 year ago

as of my testing, having prop restset via post-fs, value is false 0x0000 up until you enter pin/unlock right after it's set to 0x0003

LukeSkyD commented 1 year ago

as of my testing, having prop restset via post-fs, value is false 0x0000 up until you enter pin/unlock right after it's set to 0x0003

mine is 0x000 even after unlock, getprop shows it false.

gwolf2u commented 1 year ago

how are you adding script? personally made magisk module (maybe that's the issue)