systemd / systemd

The systemd System and Service Manager
https://systemd.io
GNU General Public License v2.0
13.27k stars 3.79k forks source link

Journal get corrupted after enabling FSS and rebooting #17833

Open fbuihuu opened 3 years ago

fbuihuu commented 3 years ago

With v246.6:

$ journalctl --interval=10s --setup-keys | tee journalctl-setup-keys.txt
$ journalctl --rotate
$ journalctl --verify --verify-key=`cat journalctl-setup-keys.txt`
[...]
=> Validated from Thu 2020-12-03 08:41:12 CET to Thu 2020-12-03 08:41:12 CET, final 0 entries not sealed.
$ reboot

$ journalctl --verify --verify-key=`cat journalctl-setup-keys.txt`
475cf8: tag/entry realtime timestamp out of synchronization                         
File corruption detected at /var/log/journal/bd6e45ab2f4f464b9f3754f9ea3d9d89/system.journal:475cf8 (of 8388608 bytes, 55%).
FAIL: /var/log/journal/bd6e45ab2f4f464b9f3754f9ea3d9d89/system.journal (Bad message)
bluca commented 3 years ago

Are you sure it got corrupted because of FSS, and it wasn't just discovered thanks to it?

fbuihuu commented 3 years ago

Given the fact that there's something wrong with the tag's timestamp itself, I would think it's due to the FSS stuff.

fbuihuu commented 3 years ago

I gathered some logs and traces, which might help understanding what's going on:

# rotating journal
Dec 11 11:13:38 localhost systemd-journald[1820]: Calculating first tag...
Dec 11 11:13:38 localhost systemd-journald[1820]: Writing tag 1 for epoch 0

# shutting down
Dec 11 11:13:40 localhost systemd-journald[1820]: Writing tag 2 for epoch 0
Dec 11 11:13:40 localhost systemd-journald[1820]: Evolving FSPRG key from epoch 0 to 1.

# booting
Dec 11 11:13:50 localhost systemd-journald[394]: Writing tag 4 for epoch 1
Dec 11 11:13:50 localhost systemd-journald[394]: Evolving FSPRG key from epoch 1 to 2.
Dec 11 11:14:00 localhost systemd-journald[394]: Writing tag 5 for epoch 2

Not sure why the trace related to tag 3 is missing but it was written into the journal:

# SYSTEMD_LOG_LEVEL=debug journalctl --verify --verify-key=`cat /root/journalctl-setup-keys.txt` --file=/var/log/journal/cc91460569ab480aa6c3fc1d6c964698/system.journal
Journal effective settings seal=no keyed_hash=yes compress=no compress_threshold_bytes=8B
File /var/log/journal/cc91460569ab480aa6c3fc1d6c964698/system.journal added.
273260: Checking tag 1...                                                           
Seeking FSPRG key to 0.
278488: Checking tag 2...                                                           
2bb310: Checking tag 3...                                                           
357500: Checking tag 4...                                                           
357500: tag/entry realtime timestamp out of synchronization                         
File corruption detected at /var/log/journal/cc91460569ab480aa6c3fc1d6c964698/system.journal:357500 (of 16777216 bytes, 20%).
FAIL: /var/log/journal/cc91460569ab480aa6c3fc1d6c964698/system.journal (Bad message)
mmap cache statistics: 6527 hit, 1 miss
fbuihuu commented 3 years ago

@poettering maybe you have an idea ?

grisu48 commented 3 years ago

We are seeing this issue also in SUSE and openSUSE Linux (Tumbleweed and Leap).

Reproducer

Tested on a Tumbleweed virtual machine

sed -i 's/.*Storage=.*/Storage=persistent/' /etc/systemd/journald.conf
sed -i 's/.*Seal=.*/Seal=yes/' /etc/systemd/journald.conf
systemctl restart systemd-journald
sleep 1
journalctl --interval=10s --setup-keys | tee journalctl-setup-keys.txt
sleep 1
journalctl --rotate
sleep 1
journalctl --verify --verify-key=`cat journalctl-setup-keys.txt`

# reboot system
reboot

journalctl --verify --verify-key=`cat journalctl-setup-keys.txt`

I'm attaching the output of my reproducer here. journalctl-corruption.txt

farblos commented 6 months ago

Similar reproducer, different errors during verification with systemd 255.4-1 on Debian testing. Epoch sequence not continuous (without reboot) and Tag failed verification (after reboot). Nothing fancy configured, journald.conf is all on defaults.

There seems to be some lack of motivation to follow up on these (up to deprecation of FSS in #28433), so I am not sure whether it is worth to provide more information ... just let me know what to do.