vianney / arch-luks-suspend

Lock encrypted root volume on suspend in Arch Linux
https://aur.archlinux.org/packages/arch-luks-suspend-git/
51 stars 30 forks source link

Stop systemd-journal before entering suspend #4

Closed guns closed 9 years ago

guns commented 10 years ago

Attempting to sleep a busy system often causes a suspend to fail with the following error:

 Freezing of tasks failed after 20.005 seconds (1 tasks refusing to freeze, wq_busy=0):
 systemd-journal D 0000000000000000     0   161      1 0x00000004
  ffff8800b96efc18 0000000000000086 ffffffffa032141d ffff88007f9d3f80
  ffff88007f9d3f80 ffff8800b96effd8 0000000000014640 ffff88007f9d4328
  0000000000014640 ffff88007f9d3f80 0000000000000000 000000000fe00000
 Call Trace:
  [<ffffffffa032141d>] ? jbd2_journal_stop+0x22d/0x3d0 [jbd2]
  [<ffffffffa035b3a0>] ? ext4_dirty_inode+0x40/0x60 [ext4]
  [<ffffffff811c1bca>] ? __find_get_block+0xca/0x290
  [<ffffffff811c0584>] ? __set_page_dirty+0x74/0xc0
  [<ffffffff811c1bca>] ? __find_get_block+0xca/0x290
  [<ffffffff81190bed>] __sb_start_write+0xbd/0x110
  [<ffffffff8109cee0>] ? __wake_up_sync+0x20/0x20
  [<ffffffffa035b588>] ext4_page_mkwrite+0x58/0x460 [ext4]
  [<ffffffff8114c0ae>] do_page_mkwrite+0x4e/0xb0
  [<ffffffff8127b509>] ? number.isra.2+0x319/0x350
  [<ffffffff8114e0b9>] do_wp_page+0x539/0x920
  [<ffffffff814e5bca>] ? schedule+0x7aa/0xfb0
  [<ffffffff81151913>] handle_mm_fault+0x953/0xeb0
  [<ffffffff8127d484>] ? vsnprintf+0x284/0x580
  [<ffffffff814ed19f>] __do_page_fault+0x18f/0x600
  [<ffffffff810a9adf>] ? devkmsg_read+0x16f/0x420
  [<ffffffff814ed632>] do_page_fault+0x22/0x30
  [<ffffffff814ea1b8>] page_fault+0x28/0x30

Here, systemd-journal has triggered a page fault (the persistent log file is mmap'ed) and goes into uninterruptible sleep while the kernel attempts to load a page from the recently locked dm-crypt device.

This operation will hang, and since the journal process can not receive signals, the machine never enters suspend-to-RAM.

While this problem is not particular to systemd-journal, it is the most frequent trigger for the problem as the journal is recording the operations of the suspend operation as it happens.

The deeper issue of IO on the crypt device after freezing, but before machine suspend, is not addressed here.

guns commented 9 years ago

This patch is included in #6, and can be elided or accepted there.