t2linux / T2-Debian-and-Ubuntu-Kernel

Ubuntu Kernel for T2 Macs.
127 stars 14 forks source link

6.5.7-t2 does not wake from suspend #53

Open aboulfad opened 8 months ago

aboulfad commented 8 months ago

Issue MacBook Pro (MBP) always freezes/crashes when using suspend in Linux-t2-ubuntu. This is evident when re-starting, the boot screen shows fsck (or equivalent) running as well as the MBP bong & login to Linux. I included dmesg log as well screenshot from boot process.

MBP Info MacBook Pro 2018, i7 Model ID: MacBookPro15,2 Controller Firmware version: 21P365 OS: Sonoma (14.0)

T2Linux Info kernel: 6.5.7-t2 OS: Ubuntu 23.10 (Mantic Minotaur)

dmesg.log

bootlog
AdityaGarg8 commented 8 months ago

Suspend has been broken since macOS Sonoma

aboulfad commented 8 months ago

Thanks, should I keep this issue open ? For other who may have trouble with this, in the meantime the behavior can be changed for lid close = ignore or lock. reference

AdityaGarg8 commented 8 months ago

Can be kept open

rydymth commented 7 months ago

Hi there!, I am having a similar problem but for some reason, before apple decided to screw up the wifi and bluetooth firmware packages with the macos sonoma, i believe i have taken a backup of the previous firmware packages. You can download from here. I am yet to figure out a way to do these things. With the latest kernel that is 6.6 something, my bluetooth stops working so i go back to previous kernel that was working very well for me before sonoma that was 6.3.6. I will keep on trying to get my suspend working. I think ill replace my current OS with a new one with the older kernel. Ill update if this worked for me.

AdityaGarg8 commented 7 months ago

I guess there is a regression in kernel 6.5 that broke Bluetooth for some models. Can you see if you get Bluetooth working on a 6.4.x and a 6.5 kernel @rydymth

rydymth commented 7 months ago

Hi! The bluetooth works on all 6.4. kernel. I tested on 6.4.7 and 6.4.8. It works just fine. It however doesnt work on 6.5.. I have tested on 6.5.2 and 6.5.8... And it does not work on 6.6.1 either...

AdityaGarg8 commented 7 months ago

Can you share your complete journalctl -k from a 6.4 and 6.5 kernel. You can use pastebin to share

rydymth commented 7 months ago

HI! Yes for sure! Here is the journalctl for linux6.4.8 where everythin works except suspend. And here is the journalctl for linux6.5.2 where bluetooth and suspend are not working. There is an array index out of bounds error in both of them. Im not able to understand much but will attempt to debug more. There is a reference to this: try_to_wake_up+0x292/0x6c0 in the stack trace. Would this be the issue? I am unable to find why bluetooth isnt working in the 6.5.2 one...

AdityaGarg8 commented 7 months ago

I've filed a bug report upstream:

https://lkml.org/lkml/2023/11/13/722

AdityaGarg8 commented 7 months ago

Hi @rydymth

Kernel 6.6.1-2 has a patch sent by upstream maintainers to fix Bluetooth on your model. Can you give it a test when it compiles. It should compile after 2.5-3hrs after I send this message.

rydymth commented 7 months ago

Hi @AdityaGarg8! Just tried the new commit. Still no luck... It is showing this error everytime i try to load/reload the hci_bcm4377 module: [ 386.052230] hci_bcm4377 0000:73:00.1: can't disable ASPM; OS doesn't have ASPM control Dmesg Log Link Journalctl Log Link

aboulfad commented 7 months ago

Hi, I am a bit confused as to how @rydymth BT’s issue is related to the suspend issue I reported ? Thanks.

AdityaGarg8 commented 7 months ago

Hi, I am a bit confused as to how @rydymth BT’s issue is related to the suspend issue I reported ? Thanks.

Its not related to suspend, but is an issue that deserves to be fixed

aboulfad commented 7 months ago

Ok, I guess @rydymth could have created another GH issue, hence my confusion.

AdityaGarg8 commented 7 months ago

Hi @AdityaGarg8! Just tried the new commit. Still no luck... It is showing this error everytime i try to load/reload the hci_bcm4377 module: [ 386.052230] hci_bcm4377 0000:73:00.1: can't disable ASPM; OS doesn't have ASPM control Dmesg Log Link Journalctl Log Link

Looks like the UBSAN error got fixed. Got another patch, try 6.6.1-3

rydymth commented 7 months ago

Yes UBSAN error is fixed. For the ASPM error i put pcie_aspm=off pcie_port_pm=off in grub's GRUB_CMDLINE_LINUX. But bluetooth still isnt working and suspend is still the same. Here is the dmesg and the journalctl I zgree with you @aboulfad I should create another issue indeed but hear me out. I believe suspend has some relation to bluetooth not working and the buggy brcmfmac_wcc module. I blieve this is because of the updated macos firmware. When i tried to suspend on the live environment without installing the firmware.tar.gz, it suspended just fine no problem. I forgot to take the dmesg from that, will take the dmesg tonight. So then i booted into 6.4.8 and this latest one 6.6.1-3 and i use this rmmod.sh script to remove the wifi and bluetooth modules via this script so that while suspend the wifi/bluetooth dont get in the way. Here is the code:

#!/usr/bin/env bash if [ "${1}" = "pre" ]; then systemctl stop NetworkManager modprobe -r brcmfmac_wcc brcmfmac hci_bcm4377 dmesg >> /home/rudy/logs/suspendPre.txt elif [ "${1}" = "post" ]; then modprobe brcmfmac_wcc brcmfmac hci_bcm4377 dmesg >> /home/rudy/logs/suspendPost.txt systemctl start NetworkManager fi

It shuts the display and i believe it suspends tasks and processes just fine. I have the USB-C connector attached to this laptop and while the screen is off the light on the USB-C connector still lights up, thats how i know my macbook didnt shut itself down. So thats why i piped the dmesg right before suspend and right after suspend. The post suspend did not update. However this is the pre suspend log. It says it suspends and then our brcmfmac driver doesnt sleep and then it resumes but then it never shows up on screen. My laptop always gets back up if i dont do the modprobe -r brcmfmac_wcc brcmfmac hci_bcm4377 either on the terminal or on rmmod.sh. But here it doesnt. I dont understand whats going on here. Bluetooth isnt being discovered at all. Using bluetoothctl show, i get this No default controller available. So maybe the bluetooth isnt related to the suspend? So should i open a new issue for bluetooth @AdityaGarg8 ?

AdityaGarg8 commented 7 months ago

Suspend is related to apple-bce so please don't mix it with Bluetooth.

But cause your Bluetooth issue has been introduced here, let it be here

AdityaGarg8 commented 7 months ago

I would prefer a log without the ASPM parameters you added and before suspending.

AdityaGarg8 commented 7 months ago

Suspend may break Bluetooth, but rn the priority is to make it working before suspending.

aboulfad commented 7 months ago

No worries @rydymth , I was trying to find some correlation or understanding, thanks.

rydymth commented 7 months ago

Suspend is related to apple-bce so please don't mix it with Bluetooth.

But cause your Bluetooth issue has been introduced here, let it be here

Got It! Removed the ASPM parameters. Here are the Dmesg, journalctl and preSuspend

AdityaGarg8 commented 7 months ago

So no errors, but Bluetooth still broken?

rydymth commented 7 months ago

Yes... No errors But no luck with bluetooth...

AdityaGarg8 commented 7 months ago

Is journalctl you sent post suspend?

AdityaGarg8 commented 7 months ago

And what is presuspend

rydymth commented 7 months ago

Is journalctl you sent post suspend?

No, This is journalctl normally after booting up.

And what is presuspend

This one is just a scirpt i wrote above. Before going to suspend i put the dmesg into this preSuspend.txt. So everytime i do systemctl suspend the dmesg gets is written to the file. In the scrpit i also have written the postSuspend script but that doesnt work...

AdityaGarg8 commented 7 months ago

Why your journalctl does not have the kernel parameters.

I want journalctl -k after you boot and attempt to start Bluetooth. Probably run sudo modprobe -r brcmfmac_wcc; sudo modprobe -r brcmfmac; sudo modprobe -r hci_bcm4377; sudo modprobe hci_bcm4377 then share journalctl -k

AdityaGarg8 commented 7 months ago

And please, no suspend

AdityaGarg8 commented 7 months ago

I need to talk to upstream, so would prefer keeping suspend out of this. Suspend is an issue which we need to handle, not upstream.

rydymth commented 7 months ago

I need to talk to upstream, so would prefer keeping suspend out of this. Suspend is an issue which we need to handle, not upstream.

Got it Here is the journalctl -k and the dmesg after removing and loading the brcmfmac_wcc brcmfmac and hci_bcm4377 modules.

aboulfad commented 7 months ago

I need to talk to upstream, so would prefer keeping suspend out of this. Suspend is an issue which we need to handle, not upstream.

Any love for the suspend issue (bringing focus back to the OP) ? I tried some steps described by @Redecorating but didn’t get me further (https://discord.com/channels/595304521857630254/595304521857630259/1175236701661184030). Any further ideas/suggestions to collect info/debug logs?

AdityaGarg8 commented 7 months ago

Not yet

AdityaGarg8 commented 7 months ago

@rydymth , can you try 6.6.1-4?

rydymth commented 7 months ago

Hi! Bluetooth still doesnt work, here are the journalctl and dmesg There is this new error that comes up tho Nov 18 22:51:36 RudyUbuMbp kernel: Bluetooth: hci0: HCI LE Coded PHY feature bit is set, but its usage is not supported.

AdityaGarg8 commented 7 months ago

Thanks, I'll try to get this solved.

Redecorating commented 7 months ago

I need to talk to upstream, so would prefer keeping suspend out of this. Suspend is an issue which we need to handle, not upstream.

Any love for the suspend issue (bringing focus back to the OP) ? I tried some steps described by @Redecorating but didn’t get me further (https://discord.com/channels/595304521857630254/595304521857630259/1175236701661184030). Any further ideas/suggestions to collect info/debug logs?

@aboulfad sorry for taking a while to get back to you. In order to try and track down where the suspend issue is, would you be able to try the following when you get a chance:

git clone https://github.com/Redecorating/apple-bce-drv apple-bce-drv-redecorating
cd apple-bce-drv-redecorating
make

There should now be aaudio.ko, apple-bce.ko and apple-bce-vhci.ko in this folder, which we will need later. (note that if you change/upgrade your kernel, you'll need to do make again)

Next add the following to your /etc/modprobe.d/blacklist.conf:

blacklist apple-bce
# some keyboard backlights
blacklist hid_apple_magic_backlight
# Touchbar
blacklist hid_appletb_bl
blacklist hid_appletb_kbd
blacklist appletbdrm
# T2 Ethernet
blacklist cdc_ncm
blacklist cdc_mbim
# Camera
blacklist uvcvideo
# Trackpad
blacklist bcm5974
# Keyboard
blacklist hid-apple
# Ambient Light Sensor
hid_sensor_als

(and if you want to undo these changes, just comment out or remove all of the lines you added)

Now you can reboot, and go back to the apple-bce-drv-redecorating folder (using an external keyboard). From here, you can try suspend with various components loaded, until you find what causes the issue (and once you find something, let us know here!)

  1. try sudo insmod ./apple-bce.ko. This should load the core bce stuff, and you can test suspend now.
  2. If that works, next you can try sudo insmod ./aaudio.ko to add in the audio component. Try suspending again.
  3. If that works ok, maybe try playing some music with the audio device and try suspend again.
  4. Next you can try sudo insmod ./apple-bce-vhci.ko. Suspend with just this, and see if it works.
  5. Now you can try sudo modprobe <module_name> for the rest of the drivers that were blacklisted. Probably don't load them all at once but see if you can find any that cause suspend to stop working.
  6. If you've loaded everything and suspend works then that'd be quite surprising.
aboulfad commented 7 months ago

@Redecorating thank you for getting back & detailed debug steps (no worries, you folks are pretty busy with other stuff). Before I start all this, I have few silly Q’s. it seems that that when the resume fails, the Ubu crashes. That happened all the time when I attempted to resume from suspend.

I can’t find or don’t know if a crash dump is created in Linux. Though when I start in MacOS the next time (even if the Mac restarted in ubu & ran fsck), MacOS complains that the machine didn’t shutdown properly and whether it should report it. I also don’t know how MacOS detects a crash when ubu ran fsck & “cleaned” its ssd partition.

Q1: Should I look for what MacOS reports for the crash or is it irrelevant? Q2: Do all those cold shutdown/restarts & running fsck consequently have any dire impact to the ssd health (any partition) ?

aboulfad commented 7 months ago

@Redecorating That was easy ! It was step 4: sudo insmod ./apple-bce-vhci.ko :). I can't find any crash dumps... (I think linux-crashdump is required, lmk if you want me to install/configure or maybe it won't help as this is a device driver panic ?)

dmesg-postSuspendCrash.log last-boot.log macos-crash.log (not much in the macOS crash dump)

Redecorating commented 7 months ago

Q1: Should I look for what MacOS reports for the crash or is it irrelevant?

macos probably retrieves those via the usb ncm ethernet connection to the t2. The log you have sent now is probably as much info as we are going to get from those. (maybe someone can RE bridgeos to see where it panicked but that's something beyond my current expertise).

Q2: Do all those cold shutdown/restarts & running fsck consequently have any dire impact to the ssd health (any partition) ?

i would guess no, but im no expert. you can do sudo smartctl /dev/nvme0 -x to check how much the ssd has been used.

@Redecorating That was easy ! It was step 4: sudo insmod ./apple-bce-vhci.ko :). I can't find any crash dumps... (I think linux-crashdump is required, lmk if you want me to install/configure or maybe it won't help as this is a device driver panic ?)

dmesg-postSuspendCrash.log

last-boot.log

macos-crash.log

(not much in the macOS crash dump) that at least tells us that the T2 is having a kernel panic.

when you have time, can you add return 0; after this line https://github.com/Redecorating/apple-bce-drv/blob/fd35f0c2eac55185f522d721e2b01e9bb55f8dd9/vhci/vhci.c#L393 then do make clean and make, and see if apple-bce-vhci can suspend? this will make it skip actually resuming so we can confirm the issue is in resume.

If that doesn't crash, then you can move the line you added to different positions in that function to see how much of the resume process can happen before the crash.

aboulfad commented 7 months ago

@Redecorating, I added the return, I don't think it crashed but maybe was in some bizarre state, as I had to press the power button for 10s to shutdown the mac. Prior, it would sound the fan for 1s & shutdown.

As much as I'd love to keep moving that return statement (I have time) & maybe cold shutdown my ssd, we would need some better debug methodology. Is there a way to get those [pr_info](https://www.kernel.org/doc/html/latest/core-api/printk-basics.html) end in some system log ? Also, if suspend worked, I'd expect an entry in journalctl which for the 4rd attempt is at L154 in the logfile:

sleep-393r0.log

PS: I am starting to regret I threw my old imac wired keyb as I am using a cheapo wired keyb w horrible keys...

Redecorating commented 7 months ago

I don't think it crashed but maybe was in some bizarre state, as I had to press the power button for 10s to shutdown the mac. Prior, it would sound the fan for 1s & shutdown.

Yeah making the resume a no-op isn't something one is really meant to do but this does indicate that the crash occurs in the resume side of things.

Is there a way to get those pr_info end in some system log ? Also, if suspend worked, I'd expect an entry in journalctl which for the 4rd attempt is at L154 in the logfile:

For the logging, can you try using journalctl -k (or if you want logs from the previous boot, journalctl -k -b -1). There indeed should be more messages from the kernel that aren't in that log file for some reason.

Can you try this branch? https://github.com/Redecorating/apple-bce-drv/tree/sonoma_suspend_debug I've made it do a suspend+resume cycle when apple-bce-vhci is loaded, and also made it do lots of logging during resume - but with enough delays between messages that you should be able to capture them with a video camera and a terminal open running journalctl -k -f which won't rely on linux being able to write logs to disk.

(note that if you don't experience the crash when loading apple-bce-vhci.ko, then maybe we have to go into proper sleep for it to trigger the issue)

aboulfad commented 7 months ago

As you predicted, there was no crash/freeze ... tbh I didn't read the changes, just did what you asked. Here's the journal log: suspend1.log

I did a suspend next & mac crashed or froze. I didn't power it down, rather closed & re-opened lid & it started booting.

AdityaGarg8 commented 7 months ago

@rydymth

bluetoothctl show
bluetoothctl devices

Can you share this with kernel 6.6.1-3

rydymth commented 7 months ago

Hi @AdityaGarg8 ! For both the commands I am getting this: No default controller available

AdityaGarg8 commented 7 months ago

Hi @AdityaGarg8 ! For both the commands I am getting this: No default controller available

Did you downgrade to 6.6.1-3?

AdityaGarg8 commented 7 months ago

You can confirm by running apt list --installed | grep linux

rydymth commented 7 months ago

Oh yes! Here is my uname Linux RudyUbuMbp 6.6.1-t2-mantic #3 SMP PREEMPT_DYNAMIC Tue Nov 14 11:16:40 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

AdityaGarg8 commented 7 months ago

Uname will not differentiate between 6.6.1-3 and 6.6.1-4

AdityaGarg8 commented 7 months ago

You can confirm by running apt list --installed | grep linux

This shall

AdityaGarg8 commented 7 months ago

I would also need your syslog, it's at /var/log/syslog, provided the kernel in use is correct. Provide it to me after you have run the Bluetooth commands i sent before.