nexus511 / gpd-ubuntu-packages

This repository shall provide the base for building ubuntu packages from most of the patches currently used to get linux on the gpd-pocket.
GNU General Public License v3.0
73 stars 4 forks source link

suspend to disk / pm-hibernate broken #40

Open sobukus opened 6 years ago

sobukus commented 6 years ago

[xubuntu 17.04, kernel 4.14] Trying pm-hibernate in a terminal provides a black screen and spinning fan, with no further progress visible. Are there configurations where suspend to disk works?

As there is no proper suspend to RAM, draining the battery in a day or two without use, a working suspend to disk is a prerequisite for the machine being of extended use for me. Am I the only one?

nexus511 commented 6 years ago

Yes. It behaves just the same on my device and always did.

To be honest I never used hibernation on any of my linux systems in the past, so I basically have no clue how well it basically should work on a stock Ubuntu and why it might fail on the GPD. I also do not have any idea, how one would debug this. While I am pleased to merge proper fixes for this, I will currently not spend time on debugging it.

sobukus commented 6 years ago

Sure, I just wanted to have mentioned it here. At some point I will probably invest time to fix this myself. I have a history of rolling my own setup with suspend to disk and encrypted swap. But debugging might be painful here. And I will have to see how long I can take running Xubuntu there instead of something more custom;-)

It's a bit disheartening how much work has to go into the GPD being actually useful with Linux (wasn't Intel hardware supposed to be easy?). I hope the video stuff stabilises … I also hoped to be able to use a USB C hub with ethernet and a full-size HDMI/DP/VGA port to have the GPD replace a traditional laptop for conferences. That should be possible, right? So far I saw no sign of that in Linux, even after enabling DP over USB in the firmware setup. At least the ethernet seems happy.

Anyhow, thanks for the work you put into this. I hope we'll bring it all together eventually.

sobukus commented 6 years ago

A suspend-to-disk cycle worked with kernel 4.16.0-rc5 from Hans' repo. But things seemed to be more unstable after that than usual, not wholly sure, though. I now remember that you might get the impression that X11 crashed due to the display being locked twice (one more password prompt screen). But I also had that stream of i2c errors apparenlty from the touchpad again.

While things seem to be improving here — pm-hibernate did indeed suspend the system — I think we need stable video support as a starting point to judge issues specific to suspending. So far, I don't see stable video support.

sobukus commented 6 years ago

I so far cannot pinpoint if pm-hibernate really caused the one instance of i2c errors I got. I can encourage people to try pm-hibernate with the newer kernels (4.16 from Hans), though. The basic suspend/resume seems to work, after all. That is an improvement.

stockmind commented 6 years ago

Those guys seems to be able to hibernate pretty well without errors: https://github.com/stockmind/gpd-pocket-ubuntu-respin/issues/93

sobukus commented 6 years ago

Yeah, hibernation itself seems to work fine. I cannot confirm any video issue directly related to this. But I do observe a constantly running fan right now. The in-kernel fan control added by Hans seems to have stopped working. Also, I'm not quite sure if these are releated after all:

[ 2719.356935] i2c_designware 808622C1:01: controller timed out
[ 2720.380651] i2c_designware 808622C1:01: controller timed out
[ 2721.405139] i2c_designware 808622C1:01: controller timed out

Also, before suspend I see these:

[  552.834786] intel_sst_acpi 808622A8:00: sst: Busy wait failed, cant send this msg
[  552.909974] intel_sst_acpi 808622A8:00: sst: Busy wait failed, cant send this msg
[  552.984758] intel_sst_acpi 808622A8:00: sst: Busy wait failed, cant send this msg
[  553.059798] intel_sst_acpi 808622A8:00: sst: Busy wait failed, cant send this msg
[ 1832.709790] PM: hibernation entry

Can you confirm that you (don't) see these kernel messages? Maybe there is a hardware issue after all here …

I'm off for a clean reboot to see if the fan starts behaving again.

sobukus commented 6 years ago

Yep, fan is good again after reboot. So it got stuck during hibernate. Is @jwrdegoede reading this? The temperature readings were fine, just the fan code apparently did not listen anymore.

I now managed to kill the touchscreen again with a suspend/resume cycle. Going to add that bit to the other issue. For completeness, both typesof i2c errors appear right after resume:

Mär 25 10:22:33 pocke kernel: i2c_designware 808622C1:05: timeout waiting for bus ready
Mär 25 10:22:33 pocke kernel: Goodix-TS i2c-GDIX1001:00: I2C transfer error: -110
Mär 25 10:22:33 pocke kernel: i2c_designware 808622C1:05: timeout waiting for bus ready
Mär 25 10:22:33 pocke kernel: Goodix-TS i2c-GDIX1001:00: I2C write end_cmd error
Mär 25 10:22:33 pocke kernel: i2c_designware 808622C1:05: timeout waiting for bus ready
Mär 25 10:22:33 pocke kernel: Goodix-TS i2c-GDIX1001:00: I2C transfer error: -11

@stockmind could you confirm that you get neither of these when suspending with your respin(s)?

I've put the full kernel log up on https://sobukus.de/gpd/kernelmsg.20180325-suspendcyclebrokentouchscreen.txt for comparison.

sobukus commented 6 years ago

Posting that bit of kernel message after a resume, just as it does not always happen (maybe even not related to resume as such):

[  350.426356] Uhhuh. NMI received for unknown reason 2c on CPU 0.
[  350.426358] Do you have a strange power saving mode enabled?
[  350.426359] Dazed and confused, but trying to continue

So far the system still seems to work fine. I read that there can be various causes for these unexpected interrupts. Maybe the reason can be made known?

sobukus commented 6 years ago

After some bumpy time with stockmind 17.10 → 18.04 using kernel 4.16-rc3, I am now on linux 4.18-rc1 based on the stockmind kernel config. With a different GPD Pocket device that I hoped is less buggy.

I managed several suspend to idle and suspend to disk cycles now and it basically works, but some things tend to break with certain randomness on both modes of suspend:

Jun 29 10:47:57: i2c_designware 808622C1:01: timeout in disabling adapter

[ 1554.922768] i2c_designware 808622C1:01: controller timed out [ 1555.946911] i2c_designware 808622C1:01: controller timed out

[ 4555.908518] i2c_designware 808622C1:00: timeout waiting for bus ready [ 4555.936961] i2c_designware 808622C1:00: timeout waiting for bus ready

-  that unknown NMI (see above)
-  some I/O errors writing to a mounted USB drive (marked as persisting), but unsure if with actual consequences (had more trouble with 4.16-rc3, apparently)

[ 4348.058966] EXT4-fs warning (device dm-4): ext4_end_bio:323: I/O error 10 writing to inode 3409454 (offset 0 size 0 starting block 35526) [ 4348.058973] Buffer I/O error on device dm-4, logical block 35526 [ 4348.058990] EXT4-fs warning (device dm-4): ext4_end_bio:323: I/O error 10 writing to inode 3409454 (offset 1863680 size 4096 starting block 35527) [ 4348.058992] Buffer I/O error on device dm-4, logical block 35527

-  battery monitoring broken, need to re-load the module for that (max17042_battery)

[ 4555.066946] power_supply max170xx_battery: driver failed to report present' property: -110 [ 4555.145397] power_supply max170xx_battery: driver failed to reportcharge_full' property: -110 [ 4555.198221] power_supply max170xx_battery: driver failed to report current_now' property: -110 [ 4555.254506] power_supply max170xx_battery: driver failed to reportpresent' property: -110 [ 4555.303747] power_supply max170xx_battery: driver failed to report `charge_now' property: -110

  Also just messages like that:

Jun 29 10:53:59: max17042 i2c-MAX17047:00: no platform data provided Jun 29 10:53:59: max17042: probe of i2c-MAX17047:00 failed with error -22

-  an unclaimed write to a register

Jun 29 10:47:55: ------------[ cut here ]------------ Jun 29 10:47:55: Unclaimed write to register 0x1e0100 Jun 29 10:47:55: WARNING: CPU: 1 PID: 7915 at drivers/gpu/drm/i915/intel_uncore.c:1077 unclaimed_reg_debug+0x46/0x60 [i915] Jun 29 10:47:55: Modules linked in: rfcomm fuse cmac bnep vfat fat snd_soc_sst_cht_bsw_rt5645 fusb302 pi3usb30532 tcpm typec joydev gpio_keys intel_rapl intel_powerclamp coretemp kvm_intel kvm snd_intel_sst_acpi snd_intel_sst_core irqbypass snd_soc_sst_atom_hifi2_platform snd_soc_rt5645 intel_cstate snd_soc_acpi snd_soc_acpi_intel_match snd_soc_rl6231 snd_soc_core snd_compress snd_seq_dummy snd_hdmi_lpe_audio ac97_bus snd_pcm_dmaengine snd_pcm snd_seq_oss extcon_intel_cht_wc bq24190_charger snd_seq_midi snd_seq_midi_event snd_rawmidi lpc_ich mei_txe intel_xhci_usb_role_switch mei roles snd_seq intel_hid dw_dmac snd_seq_device intel_cht_int33fe sparse_keymap snd_timer goodix hci_uart tpm_crb snd tpm_tis max17042_battery btqca soundcore tpm_tis_core spi_pxa2xx_platform tpm dptf_power int3400_thermal soc_button_array Jun 29 10:47:55: processor_thermal_device int3406_thermal int3403_thermal acpi_thermal_rel intel_soc_dts_iosf int340x_thermal_zone gpd_pocket_fan intel_int0002_vgpio acpi_pad parport_pc ppdev lp parport brcmfmac brcmutil cfg80211 dm_crypt overlay mmc_block btusb btrtl btbcm btintel bluetooth rfkill uas ecdh_generic usb_storage i915 crct10dif_pclmul crc32_pclmul crc32c_intel i2c_algo_bit drm_kms_helper ghash_clmulni_intel drm i2c_cht_wc sdhci_acpi video sdhci mmc_core Jun 29 10:47:55: CPU: 1 PID: 7915 Comm: kworker/u8:14 Not tainted 4.18.0-rc1-thor+ #1 Jun 29 10:47:55: Hardware name: Default string Default string/Default string, BIOS 5.11 06/28/2017 Jun 29 10:47:55: Workqueue: events_unbound async_run_entry_fn Jun 29 10:47:55: RIP: 0010:unclaimed_reg_debug+0x46/0x60 [i915] Jun 29 10:47:55: Code: 05 5b 5d 41 5c c3 45 84 e4 48 c7 c0 0a 60 2d c0 48 c7 c6 00 60 2d c0 8d 55 00 48 0f 44 f0 48 c7 c7 13 60 2d c0 e8 8a a1 ea dd <0f> 0b 5b 5d 83 2d e7 60 15 00 01 41 5c c3 66 90 66 2e 0f 1f 84 00 Jun 29 10:47:55: RSP: 0000:ffffb1cfc2c57d58 EFLAGS: 00010046 Jun 29 10:47:55: RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000006 Jun 29 10:47:55: RDX: 0000000000000007 RSI: 0000000000000002 RDI: ffffa00ebfc96930 Jun 29 10:47:55: RBP: 00000000001e0100 R08: 0000000000000000 R09: 0000000000000024 Jun 29 10:47:55: R10: 00000000000003ad R11: 0000000000000000 R12: 0000000000000000 Jun 29 10:47:55: R13: ffffa00ea87d0000 R14: 0000000000000202 R15: ffffffff9f1220bb Jun 29 10:47:55: FS: 0000000000000000(0000) GS:ffffa00ebfc80000(0000) knlGS:0000000000000000 Jun 29 10:47:55: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Jun 29 10:47:55: CR2: 0000000000000000 CR3: 000000027020a000 CR4: 00000000001006e0 Jun 29 10:47:55: Call Trace: Jun 29 10:47:55: fwtable_write32+0x21b/0x260 [i915] Jun 29 10:47:55: intel_power_domains_init_hw+0x963/0x9f0 [i915] Jun 29 10:47:55: i915_drm_resume_early+0x97/0x140 [i915] Jun 29 10:47:55: ? i915_pm_thaw_early+0x10/0x10 [i915] Jun 29 10:47:55: dpm_run_callback+0x4e/0x160 Jun 29 10:47:55: device_resume_early+0xe6/0x160 Jun 29 10:47:55: async_resume_early+0x19/0x40 Jun 29 10:47:55: async_run_entry_fn+0x3c/0x170 Jun 29 10:47:55: process_one_work+0x194/0x380 Jun 29 10:47:55: worker_thread+0x30/0x3c0 Jun 29 10:47:55: ? process_one_work+0x380/0x380 Jun 29 10:47:55: kthread+0x116/0x130 Jun 29 10:47:55: ? kthread_create_worker_on_cpu+0x70/0x70 Jun 29 10:47:55: ret_from_fork+0x35/0x40 Jun 29 10:47:55: ---[ end trace 5a832984db4cc410 ]---


I guess we need to fix the remaining ACPI errors (in dmesg on each boot) and this lingering i2c controller issue. I guess it is responsible for further breakage … that last one could be just another intel video driver bug, eh? At least I would like to not blame my hardware this time. I don't want to have to buy a dozen GPD Pockets to get one functioning unit.

@jwrdegoede: Do you have some insight on the persisting i2c controller issues? Also, I still have the other GPD unit that showed more issues … I wonder if it is really broken or just more likely to show problems. I'd like to test things on that box from time to time, too … maybe it's just all 'normal' quirks.
sobukus commented 6 years ago

Just to have the collection complete:

[ 2845.111598] [drm:intel_pipe_update_end [i915]] *ERROR* Atomic update failure on pipe B (start=140096 end=140097) time 423 us, min 1908, max 1919, scanline start 1874, end 1924

This is without a suspend cycle, just happening during graphical desktop work. A bug report to the intel graphics folks may be in order, anyway.

Another note about the charger:

[   26.368082] rt5645 i2c-10EC5645:00: Detected GPD Win / Pocket platform
[   26.368095] rt5645 i2c-10EC5645:00: i2c-10EC5645:00 supply avdd not found, using dummy regulator
[   26.368127] rt5645 i2c-10EC5645:00: i2c-10EC5645:00 supply cpvdd not found, using dummy regulator
…
[   27.709464] cht_wcove_pwrsrc cht_wcove_pwrsrc: Could not detect charger type
[   27.714801] bq24190-charger i2c-bq24190: Fault: boost 0, charge 1, battery 0, ntc 0
[   27.724511] bq24190-charger i2c-bq24190: Fault: boost 0, charge 0, battery 0, ntc 0
[   28.592770] cht_wcove_pwrsrc cht_wcove_pwrsrc: Could not detect charger type

I have 12 V from the wall psu, but only 500 mA current, while /sys/class/power_supply/tcpm-source-psy-i2c-fusb302/current_now (wasnt that named a bit differently before?) claims 2000 mA. Another reboot …

I'll stop spamming this issue now. It's just just an intermingled mess of issues with a device that almost works. I hope we get it untangled.

Thaodan commented 6 years ago

Has anyone tried reloading the soc sound module after hibernate? Some with a similar device (same soc tried that). however I can get that to work (module in use).

My adaption of this here: https://gitlab.com/Arch-Enterprise-packaging/snd_soc_sst_cht_bsw_rt5645_hibernate_fix/blob/master/snd_soc_sst_cht_bsw_rt5645_reset.sh