samueldr / wip-pinebook-pro

More information on the Unofficial NixOS Wiki
https://nixos.wiki/wiki/NixOS_on_ARM/PINE64_Pinebook_Pro
65 stars 18 forks source link

Suspend to RAM with mainline #3

Open samueldr opened 4 years ago

samueldr commented 4 years ago

(Tracking issue...)

theotherjimmy commented 4 years ago

Might be related: https://github.com/rockchip-linux/kernel/commit/3cc3b0376b0208bc7a8f3437d70416789511c99f

EDIT: FYI, the files added in that commit are not present in tsys' kernel.

samueldr commented 4 years ago

Entirely plausible. Good find!

I wonder if that driver has a mainline patch open. Otherwise it looks relatively self-contained, wondering how hard it is to forward prat.

theotherjimmy commented 4 years ago

I'm testing the blunt forward port right now. It did not apply cleanly, so I'm going to have to do something about that.

theotherjimmy commented 4 years ago

Working (EDIT: as in "i'm working on it in this branch", not "this branch works") branch: https://github.com/theotherjimmy/wip-pinebook-pro/tree/sleep

theotherjimmy commented 4 years ago

Hmmmmm.... https://github.com/rockchip-linux/docs/tree/master/Kernel/S2R

theotherjimmy commented 4 years ago

I got suspend to ram working! I had no measurable charge loss over 4 hours of suspend. Logs to show that it happened (note the "deep"):

Feb 08 08:33:39 nixos kernel: PM: suspend entry (deep)
Feb 08 12:39:21 nixos kernel: Filesystems sync: 0.191 seconds
Feb 08 12:39:21 nixos kernel: dwmmc_rockchip fe310000.dwmmc: pre_suspend failed for non-removable host>
Feb 08 12:39:21 nixos kernel: Freezing user space processes ... (elapsed 0.002 seconds) done.
Feb 08 12:39:21 nixos kernel: OOM killer disabled.
Feb 08 12:39:21 nixos kernel: Freezing remaining freezable tasks ... (elapsed 0.002 seconds) done.
Feb 08 12:39:21 nixos kernel: printk: Suspending console(s) (use no_console_suspend to debug)
Feb 08 12:39:21 nixos kernel: Disabling non-boot CPUs ...
Feb 08 12:39:21 nixos kernel: CPU1: shutdown
Feb 08 12:39:21 nixos kernel: psci: CPU1 killed (polled 0 ms)
Feb 08 12:39:21 nixos kernel: CPU2: shutdown
Feb 08 12:39:21 nixos kernel: psci: CPU2 killed (polled 0 ms)
Feb 08 12:39:21 nixos kernel: CPU3: shutdown
Feb 08 12:39:21 nixos kernel: psci: CPU3 killed (polled 0 ms)
Feb 08 12:39:21 nixos kernel: CPU4: shutdown
Feb 08 12:39:21 nixos kernel: psci: CPU4 killed (polled 0 ms)
Feb 08 12:39:21 nixos kernel: CPU5: shutdown
Feb 08 12:39:21 nixos kernel: psci: CPU5 killed (polled 4 ms)
Feb 08 12:39:21 nixos kernel: Enabling non-boot CPUs ...
Feb 08 12:39:21 nixos kernel: Detected VIPT I-cache on CPU1
Feb 08 12:39:21 nixos kernel: GICv3: CPU1: found redistributor 1 region 0:0x00000000fef20000
Feb 08 12:39:21 nixos kernel: CPU1: Booted secondary processor 0x0000000001 [0x410fd034]
Feb 08 12:39:21 nixos kernel: CPU1 is up
Feb 08 12:39:21 nixos kernel: Detected VIPT I-cache on CPU2
Feb 08 12:39:21 nixos kernel: GICv3: CPU2: found redistributor 2 region 0:0x00000000fef40000
Feb 08 12:39:21 nixos kernel: CPU2: Booted secondary processor 0x0000000002 [0x410fd034]
Feb 08 12:39:21 nixos kernel: CPU2 is up
Feb 08 12:39:21 nixos kernel: Detected VIPT I-cache on CPU3
Feb 08 12:39:21 nixos kernel: GICv3: CPU3: found redistributor 3 region 0:0x00000000fef60000
Feb 08 12:39:21 nixos kernel: CPU3: Booted secondary processor 0x0000000003 [0x410fd034]
Feb 08 12:39:21 nixos kernel: CPU3 is up
Feb 08 12:39:21 nixos kernel: Detected PIPT I-cache on CPU4
Feb 08 12:39:21 nixos kernel: GICv3: CPU4: found redistributor 100 region 0:0x00000000fef80000
Feb 08 12:39:21 nixos kernel: CPU4: Booted secondary processor 0x0000000100 [0x410fd082]
Feb 08 12:39:21 nixos kernel: CPU4 is up
Feb 08 12:39:21 nixos kernel: Detected PIPT I-cache on CPU5
Feb 08 12:39:21 nixos kernel: GICv3: CPU5: found redistributor 101 region 0:0x00000000fefa0000
Feb 08 12:39:21 nixos kernel: CPU5: Booted secondary processor 0x0000000101 [0x410fd082]
Feb 08 12:39:21 nixos kernel: CPU5 is up
Feb 08 12:39:21 nixos kernel: usb usb5: root hub lost power or was reset
Feb 08 12:39:21 nixos kernel: usb usb6: root hub lost power or was reset
Feb 08 12:39:21 nixos kernel: cdn-dp fec00000.dp: [drm:cdn_dp_pd_event_work [rockchipdrm]] Not connect>
Feb 08 12:39:21 nixos kernel: usb usb7: root hub lost power or was reset
Feb 08 12:39:21 nixos kernel: usb usb8: root hub lost power or was reset
Feb 08 12:39:21 nixos kernel: OOM killer enabled.
Feb 08 12:39:21 nixos kernel: Restarting tasks ... done.
Feb 08 12:39:21 nixos kernel: PM: suspend exit
samueldr commented 4 years ago

Now, I impatiently wait for the changes :).

theotherjimmy commented 4 years ago

Seems that my branch was old, and I have amended the series yet again. I'm going to push to a different branch, after rebasing with the latest master.

xantoz commented 4 years ago

From what I hear one must use the BSP, rather than mainline, u-boot for this to work. Even with the mainline/manjaro kernel

Manjaro has got it working that way.

samueldr commented 4 years ago

@xantoz yes, you're right, see #7.

theotherjimmy commented 4 years ago

Long time no progress. I finally have an automated reproducer for the suspend issues, using levinboot. This makes testing any fixes a much quicker process. I've now confirmed that TF-A enters the suspend state correctly (as far as I can tell) and that no external input can wake it. I have to track down how to enable a method for an external wake event (including at least the power button, as I can't easily test the lid switch with the back of my pbp) and confirm wakeup after that.

theotherjimmy commented 4 years ago

I suppose I should update this. Note that I have not worked on this in about 2 months (maybe more). I have managed to configure a wakeup source in TF-A, and, with much help from crystalgamma, was able to get LPDDR4 resume on the right track. However, the current issues is that returning from the enable-mmu code in TF-A raises an unhandled exception by returning to an unmapped address. I recently had my first child, so I may not be returning to this for a bit :sweat_smile: If someone wants to take up the torch, I can provide my patch series, but I'm hesitant to post it publicly.

tgunnoe commented 3 years ago

Is this issue referring to the battery drain I get when shutting "suspending" the laptop? Usually it never seems to powersave

theotherjimmy commented 3 years ago

@tgunnoe, Yes. the drain at the moment goes from 100% (ish) to 0% in < 36 hours. With suspend support in upstream TF-A, the power drain could be minimized to allow up to 20 days of suspend, or something like that.

shadowrylander commented 3 years ago

Any updates on this? I'm looking to get a Pinebook Pro this month or so, and was wondering if I could help in any way!

theotherjimmy commented 3 years ago

Right, so I finally got around to hooking up a debugger to my PBP this past week. :crossed_fingers: I'll be able to push the patch upstream soon.

theotherjimmy commented 3 years ago

It's published: https://review.trustedfirmware.org/c/TF-A/trusted-firmware-a/+/9616

shadowrylander commented 3 years ago

@theotherjimmy Wait; so how do we use this, again...? Sorry, a bit new to this!

samueldr commented 3 years ago

(Assuming NixOS), you would apply the patch to the TF-A for RK3399.

For example, you could add it to the override used here:

https://github.com/samueldr/wip-pinebook-pro/blob/497b7f7641b731df220f4538cf981574140186ee/u-boot/default.nix#L18-L26

Not sure if there is a minimum version of TF-A that needs to be used.

This, in turn, will be used by U-Boot. So you'll need to build and update U-Boot accordingly for your setup.

theotherjimmy commented 3 years ago

I developed the patch based on a branch off of pre-2.3. It should apply cleanly to anything starting 2.3 onward.

shadowrylander commented 3 years ago

@samueldr So override U-Boot with the patch, set it up, and build?

theotherjimmy commented 3 years ago

@shadowrylander This is a patch for TF-A, not uboot.

shadowrylander commented 3 years ago

(Assuming NixOS), you would apply the patch to the TF-A for RK3399.

For example, you could add it to the override used here:

https://github.com/samueldr/wip-pinebook-pro/blob/497b7f7641b731df220f4538cf981574140186ee/u-boot/default.nix#L18-L26

Not sure if there is a minimum version of TF-A that needs to be used.

This, in turn, will be used by U-Boot. So you'll need to build and update U-Boot accordingly for your setup.

So where would I apply the patch in the link provided here...?

shadowrylander commented 3 years ago

Or wait; was the comment by @samueldr not for me originally?

theotherjimmy commented 3 years ago

@shadowrylander It was probably meant for all of us.

samueldr commented 3 years ago

Do we need #7's changes for this to work? Namely ROCKCHIP_SIP and ROCKCHIP_SUSPEND_MODE.

(I still haven't taken the time to actively test...)

theotherjimmy commented 3 years ago

Do we need #7's changes for this to work? Namely ROCKCHIP_SIP and ROCKCHIP_SUSPEND_MODE.

No. closed #7

samueldr commented 3 years ago

I can verify that, with the default Tow-Boot build for the Pinebook Pro, which at the time includes the patch this works for me on 5.11.

theotherjimmy commented 3 years ago

@samueldr Thanks for being one of the first tester's that's not me! I feel a lot better knowing that my results have been reproduced.

samueldr commented 3 years ago

@theotherjimmy not knowing much about all this, I still feel the comments about how it may or may not actually work depending on the conditions it resumes from, from the reviews, are probably valid.

But at least in a limited testing it seems to work.

One time the PBP panic'd, it had slept for a short while, but it panic'd long after resuming.

theotherjimmy commented 3 years ago

Oh, dang. That's probably related to not restoring the lower frequency as the reviewer suggested might be the case.

That being said, without debugging, I have no idea.

samueldr commented 3 years ago

Exactly, and I tried reproducing, left the pinebook pro under similar conditions, booted, slept not too long after for not long (not even a minute I think). Then left the pinebook pro on, without display suspend.

While the time it panic'd it was I think under 12 hours, leaving it ~36 hours on didn't seem to reproduce the issue.

This is going to be a hard one to reproduce, if indeed it is related to the suspend/resume cycle and that suggestion.

theotherjimmy commented 3 years ago

Honestly, if it's panicing, RAM is working. Unless we're seeing corruption.

samueldr commented 3 years ago

I wouldn't know enough to confirm or deny :)

zhaofengli commented 3 years ago

Tried out the patch with an NVMe drive, and the drive is frozen upon wake up. It still shows up in lspci but any operation against the drive hangs. I think the behavior is consistent with suspending with the BSP U-Boot + TF-A.

[ 1934.301295] INFO: task fdisk:3791 blocked for more than 966 seconds.
[ 1934.304747]       Tainted: P         C O      5.10.35 #1-NixOS
[ 1934.308194] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1934.311687] task:fdisk           state:D stack:    0 pid: 3791 ppid:  3780 flags:0x00000001
[ 1934.311698] Call trace:
[ 1934.311713]  __switch_to+0x10c/0x168
[ 1934.311723]  __schedule+0x2c4/0x738
[ 1934.311729]  schedule+0x50/0xd8
[ 1934.311737]  blk_queue_enter+0x138/0x290
[ 1934.311742]  submit_bio_noacct+0x364/0x400
[ 1934.311748]  submit_bio+0x54/0x1e0
[ 1934.311754]  mpage_readahead+0x154/0x188
[ 1934.311760]  blkdev_readahead+0x20/0x30
[ 1934.311767]  read_pages+0xa0/0x288
[ 1934.311771]  page_cache_ra_unbounded+0x13c/0x218
[ 1934.311776]  do_page_cache_ra+0x48/0x58
[ 1934.311780]  force_page_cache_ra+0xb0/0x108
[ 1934.311785]  page_cache_sync_ra+0x54/0x120
[ 1934.311791]  generic_file_buffered_read+0x4b8/0xa30
[ 1934.311796]  generic_file_read_iter+0x108/0x1a8
[ 1934.311802]  blkdev_read_iter+0x44/0x58
[ 1934.311808]  new_sync_read+0xf0/0x188
[ 1934.311813]  vfs_read+0x150/0x1e0
[ 1934.311818]  ksys_read+0x74/0x100
[ 1934.311823]  __arm64_sys_read+0x24/0x30
[ 1934.311830]  el0_svc_common.constprop.0+0x80/0x1a8
[ 1934.311835]  do_el0_svc+0x2c/0x98
[ 1934.311841]  el0_svc+0x20/0x30
[ 1934.311846]  el0_sync_handler+0xb0/0xb8
[ 1934.311852]  el0_sync+0x178/0x180
[ 1956.829228] nvme nvme0: I/O 9 QID 0 timeout, completion polled
[ 1956.829375] nvme nvme0: 6/0/0 default/read/poll queues
[ 1987.549123] nvme nvme0: I/O 325 QID 2 timeout, aborting
[ 2018.268958] nvme nvme0: I/O 2 QID 0 timeout, completion polled
[ 2018.269094] nvme nvme0: Abort status: 0x0
[ 2018.269158]  nvme0n1: p1
[ 2079.708782] nvme nvme0: I/O 13 QID 0 timeout, completion polled
[ 2141.148592] nvme nvme0: I/O 14 QID 0 timeout, completion polled
theotherjimmy commented 3 years ago

Yes, I think that's expected behavior ATM. I'd love to fix it, as I now have a NVMe in my PBP, but it's extremely low on my priority list.

miniBill commented 2 years ago

What's the current status of suspend to RAM?

yatli commented 2 years ago

Hey guys, I'm trying to bring s2ram to mainline for DevTerm A06 (rk3399). https://forum.clockworkpi.com/t/getting-suspend-to-work-properly-on-a06/8404/18?u=yatli

My "working branch" is here :) https://github.com/yatli/arm-trusted-firmware/tree/rk3399_dev

I've marked and aligned some routines from the rkbin bl31.elf but there're still missing pieces: the way ATF accepts aux parameters, the mysterious PMUGRF_OS_REG2, the way PSCI is called (not returning from WFI, but jumping into the suspend_finish routine -- the DevTerm never reaches suspend_finish)

I'm wondering if you have suggestions and tips? Currently it goes into deep sleep but it's not coming back. I'm fairly new to all this but gradually finding my way around...

It'd be also cool to be able to set up debug uart or debugger, as currently all I have is to output debug bits with the onboard FAN spinning/not spinning......