systemd / systemd

The systemd System and Service Manager
https://systemd.io
GNU General Public License v2.0
13.08k stars 3.74k forks source link

Hibernation fails with no error message when swap space is low #16128

Open fstirlitz opened 4 years ago

fstirlitz commented 4 years ago

systemd version the issue has been seen with

systemd 245 (245.6-1)
+PAM +AUDIT +SELINUX +IMA +APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +SECCOMP +BLKID +ELFUTILS +KMOD +IDN2 -IDN +PCRE2 default-hierarchy=hybrid

Used distribution Debian (sid)

Unexpected behaviour you saw I ran systemctl hibernate from a root prompt. I saw some disk activity, but the command exited with no output without the system having hibernated in the meantime.

The systemd-hibernate service produced no error message to the journal either, it just returned a failing exit code:

jun 09 17:10:10 boole systemd[1]: Starting Hibernate...
jun 09 17:10:10 boole systemd-sleep[852]: Suspending system...
jun 09 17:12:44 boole systemd-sleep[933]: /dev/sda:
jun 09 17:12:44 boole systemd-sleep[933]:  setting Advanced Power Management level to 0x80 (128)
jun 09 17:12:44 boole systemd-sleep[933]:  APM_level        = 128
jun 09 17:12:53 boole systemd-sleep[978]: /dev/sda:
jun 09 17:12:53 boole systemd-sleep[978]:  setting standby to 36 (3 minutes)
jun 09 17:12:03 boole systemd[1]: systemd-hibernate.service: Main process exited, code=exited, status=1/FAILURE
jun 09 17:12:03 boole systemd[1]: systemd-hibernate.service: Failed with result 'exit-code'.
jun 09 17:12:04 boole systemd[1]: Failed to start Hibernate.

When I manually attempted to run echo disk > /sys/power/state, it failed with ENOMEM. This pointed me to the real issue: the swap space was too low. When I closed a couple of programs, hibernation succeeded.

Expected behaviour you didn't see systemctl hibernate displaying an error message telling me what the problem is.

Steps to reproduce the problem The issue appeared on an i386 with 1.4 GiB of memory and 2 GiB of swap, of which 1.1 GiB was in use (the resident memory footprint I have lost at this point). The swap partition is configured via an /etc/fstab entry, so nothing out of the ordinary. The only unusual element is that I have recently enabled zswap.

poettering commented 4 years ago

Hmm, this is puzzling. The relevant code is here:

https://github.com/systemd/systemd/blob/master/src/sleep/sleep.c#L215

You see the "Suspending system..." message in your logs, so that's great. But the message you are looking for is the one "Failed to suspend system. System resumed again." right after on failure. It's really weird that you are not seeing that. Are you sure your logs are fully complete and this is not further down?

fstirlitz commented 4 years ago

The previous was an excerpt from journalctl -u systemd-hibernate; systemctl status systemd-hibernate showed something similar. (I wanted to avoid scrolling through irrelevant messages about the wireless network coming up and down.) But sure, when I went looking for that without any filtering, there it is:

jun 09 17:10:10 boole systemd[1]: Starting Hibernate...
jun 09 17:10:10 boole systemd-sleep[852]: Suspending system...
jun 09 17:10:10 boole kernel: PM: hibernation: hibernation entry
jun 09 17:10:10 boole kernel: (NULL device *): firmware: direct-loading firmware regulatory.db
jun 09 17:10:10 boole kernel: (NULL device *): firmware: direct-loading firmware regulatory.db.p7s
jun 09 17:10:33 boole kernel: Filesystems sync: 0.496 seconds
jun 09 17:11:17 boole kernel: Freezing user space processes ... (elapsed 0.010 seconds) done.
jun 09 17:11:18 boole kernel: OOM killer disabled.
jun 09 17:11:19 boole kernel: PM: hibernation: Preallocating image memory
jun 09 17:11:22 boole kernel: PM: hibernation: Allocated 147922 pages for snapshot
jun 09 17:11:24 boole kernel: PM: hibernation: Allocated 591688 kbytes in 18.80 seconds (31.47 MB/s)
jun 09 17:11:26 boole kernel: Freezing remaining freezable tasks ... (elapsed 0.004 seconds) done.
jun 09 17:11:26 boole kernel: printk: Suspending console(s) (use no_console_suspend to debug)
jun 09 17:11:28 boole kernel: parport_pc 00:06: disabled
jun 09 17:11:30 boole kernel: serial 00:05: disabled
jun 09 17:11:31 boole kernel: toshiba_acpi: Unable to disable hotkeys
jun 09 17:11:32 boole kernel: eth0: Going into suspend...
jun 09 17:11:34 boole kernel: ------------[ cut here ]------------
jun 09 17:11:36 boole kernel: i915 raw-wakerefs=1 wakelocks=1 on cleanup
jun 09 17:11:36 boole kernel: WARNING: CPU: 0 PID: 879 at drivers/gpu/drm/i915/intel_runtime_pm.c:599 intel_runtime_pm_driver_release+0x27/0x30 [i915]
jun 09 17:11:38 boole kernel: Modules linked in: cpuid dm_mod snd_seq snd_seq_device nls_ascii nbd nls_utf8 nls_cp437 vfat fat michael_mic lib80211_crypt_tkip libarc4 ctr ae
s_generic libaes crypto_simd cryptd ccm lib80211_crypt_ccmp binfmt_misc ses enclosure scsi_transport_sas hid_generic uas usbhid usb_storage hid pktcdvd iTCO_wdt iTCO_vendor_
support watchdog ppdev snd_intel8x0m joydev pcspkr pcmcia snd_intel8x0 snd_ac97_codec ipw2200 ac97_bus libipw yenta_socket snd_pcm toshiba_acpi snd_timer lib80211 lpc_ich in
dustrialio mfd_core pcmcia_rsrc cfg80211 snd sr_mod pcmcia_core soundcore sparse_keymap cdrom rfkill sg wmi rng_core parport_pc parport uhci_hcd ehci_pci ehci_hcd usbcore ba
ttery ac usb_common button acpi_cpufreq squashfs zstd_decompress loop pkcs8_key_parser cuse fuse ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 crc32c_generic sd_mod t10
_pi crc_t10dif crct10dif_generic crct10dif_common ata_generic i915 i2c_algo_bit drm_kms_helper ata_piix cec libata drm psmouse evdev scsi_mod e100 serio_raw
jun 09 17:11:40 boole kernel:  mii fan video [last unloaded: nbd]
jun 09 17:11:41 boole kernel: CPU: 0 PID: 879 Comm: kworker/u2:26 Tainted: G        W         5.6.0-2-686-pae #1 Debian 5.6.14-1
jun 09 17:11:54 boole kernel: Hardware name: TOSHIBA TECRA A2/Portable PC, BIOS Version 1.40 04/25/2005
jun 09 17:11:54 boole kernel: Workqueue: events_unbound async_run_entry_fn
jun 09 17:11:55 boole kernel: EIP: intel_runtime_pm_driver_release+0x27/0x30 [i915]
jun 09 17:11:55 boole kernel: Code: eb c0 90 3e 8d 74 26 00 8b 00 85 c0 75 05 c3 8d 74 26 00 55 89 c2 0f b7 c0 c1 fa 10 89 e5 52 50 68 a8 15 87 f8 e8 4e 6f 91 de <0f> 0b 83 c4 0c c9 c3 66 90 3e 8d 74 26 00 8b 88 50 b5 ff ff 55 8d
jun 09 17:11:55 boole kernel: EAX: 0000002a EBX: f5e40000 ECX: 00000007 EDX: f6fe6840
jun 09 17:11:55 boole kernel: ESI: f6a07800 EDI: f5e44b84 EBP: e7f49ea0 ESP: e7f49e94
jun 09 17:11:55 boole kernel: DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 EFLAGS: 00010282
jun 09 17:11:56 boole kernel: CR0: 80050033 CR2: 01d6efe6 CR3: 23120000 CR4: 000006f0
jun 09 17:11:56 boole kernel: Call Trace:
jun 09 17:11:56 boole kernel:  i915_drm_suspend_late+0x122/0x130 [i915]
jun 09 17:11:56 boole kernel:  i915_pm_freeze_late+0x21/0x30 [i915]
jun 09 17:11:57 boole kernel:  dpm_run_callback+0x53/0x140
jun 09 17:11:57 boole kernel:  ? i915_pm_poweroff_late+0x30/0x30 [i915]
jun 09 17:11:57 boole kernel:  __device_suspend_late+0xa6/0x140
jun 09 17:11:57 boole kernel:  ? i915_pm_poweroff_late+0x30/0x30 [i915]
jun 09 17:11:57 boole kernel:  ? async_suspend+0x80/0x80
jun 09 17:11:58 boole kernel:  async_suspend_late+0x1c/0x80
jun 09 17:11:58 boole kernel:  async_run_entry_fn+0x3a/0x170
jun 09 17:11:58 boole kernel:  process_one_work+0x17b/0x320
jun 09 17:11:58 boole kernel:  worker_thread+0x156/0x3d0
jun 09 17:11:58 boole kernel:  ? process_one_work+0x320/0x320
jun 09 17:11:58 boole kernel:  kthread+0xdc/0x110
jun 09 17:11:58 boole kernel:  ? process_one_work+0x320/0x320
jun 09 17:11:58 boole kernel:  ? kthread_park+0xa0/0xa0
jun 09 17:11:58 boole kernel:  ret_from_fork+0x2e/0x38
jun 09 17:11:59 boole kernel: ---[ end trace 3b63be41259d6914 ]---
jun 09 17:11:59 boole kernel: Disabling non-boot CPUs ...
jun 09 17:11:59 boole kernel: PM: hibernation: Creating image:
jun 09 17:11:59 boole kernel: PM: hibernation: Need to copy 147766 pages
jun 09 17:12:00 boole kernel: PM: hibernation: Not enough free memory
jun 09 17:12:04 boole kernel: PM: hibernation: Error -12 creating image
jun 09 17:12:04 boole kernel: eth0: Coming out of suspend...
jun 09 17:12:04 boole kernel: serial 00:05: activated
jun 09 17:12:04 boole kernel: parport_pc 00:06: activated
jun 09 17:12:04 boole kernel: ata1.00: ACPI cmd ef/03:0c:00:00:00:00 (SET FEATURES) filtered out
jun 09 17:12:05 boole kernel: ata1.00: ACPI cmd ef/03:45:00:00:00:00 (SET FEATURES) filtered out
jun 09 17:12:05 boole kernel: ata2.00: ACPI cmd ef/03:0c:00:00:00:00 (SET FEATURES) filtered out
jun 09 17:12:05 boole kernel: ata2.00: ACPI cmd ef/03:42:00:00:00:00 (SET FEATURES) filtered out
jun 09 17:12:05 boole kernel: OOM killer enabled.
jun 09 17:12:06 boole kernel: Restarting tasks ... done.
jun 09 17:12:07 boole kernel: video LNXVIDEO:00: Restoring backlight state
jun 09 17:12:07 boole kernel: PM: hibernation: hibernation exit
jun 09 17:10:34 boole rtkit-daemon[11537]: The canary thread is apparently starving. Taking action.
jun 09 17:12:18 boole kernel: eth0: Setting MAC to 2a:d9:b9:fd:0f:62
jun 09 17:12:18 boole kernel: IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
jun 09 17:10:40 boole systemd-sleep[852]: Failed to suspend system. System resumed again: Cannot allocate memory

Still, I’d have expected something in the terminal, not just in the journal.