ostreedev / ostree

Operating system and container binary deployment and upgrades
https://ostreedev.github.io/ostree/
Other
1.24k stars 290 forks source link

Slow IO cause stateroot not bind mounted #3109

Open cospotato opened 7 months ago

cospotato commented 7 months ago

What happened?

We use ostree and rpm-ostree built a CoreOS-like OS on top of Rocky.

If the disk IO very slow when boot up. Systemd generator will timeout and terminated by an internal ALRM signal.

CleanShot 2023-12-04 at 15 57 26@2x

The var.mount unit will not be generated. So the stateroot not be mounted.

How to reproduce?

inject a latency to disk just before Switch root

image

cgwalters commented 2 months ago

Thanks for the report. I assume your storage must be really slow to get this...is this really the only thing that fails?

It looks to me like in current git main systemd basically hardcodes the default (compiled in) timeout when running generators: https://github.com/systemd/systemd/blob/a3680a45d0356ff3ee40dcf1d697326497a3949c/src/core/manager.c#L4123

So I'm not sure there's much we can do here; you'll need to file this with systemd; so closing here. (But without prejudice, feel free to reopen if you disagree)

Probably in cases like this they may need to tune down parallelism, or support longer timeouts. It's an IMO open question whether it's better to have a system with a broken generator just fail to boot, or boot but in a potentially undefined state.

cospotato commented 2 months ago

@cgwalters Thanks for your reply. In our case, the underlaying volume i/o delay reach seconds level. Maybe we should fix the disk issue first 🤣. But in my opinion, OSTree should break the startup process when state root was not mount correctly. So we made a hack systemd service like below to break startup process and restart if state root not mounted:

[Unit]
Description=Check OSTree stateroot mount
After=ostree-remount.service
OnFailure=reboot.target
OnFailureJobMode=replace-irreversibly

[Service]
Type=oneshot
RemainAfterExit=yes
ExecStart=/bin/systemctl status var.mount

[Install]
WantedBy=local-fs.target
cgwalters commented 2 months ago

But in my opinion, OSTree should break the startup process when state root was not mount correctly.

OK hmmm...we can try to investigate this at some point. But a question: do you just not have many other generators?

Or maybe the real thing we should do is change things so that /var is always mounted to the stateroot in the initramfs, and the fstab generator defaults to detecting that and unmounting if we have a real version.

cospotato commented 2 months ago

Yes! The main issue is the stateroot mounting problem which cause the service (like containerd) wrote to the wrong place and lost the data in next reboot. Reboot just a mitigation for stateroot not being mounted.