Open cospotato opened 7 months ago
Thanks for the report. I assume your storage must be really slow to get this...is this really the only thing that fails?
It looks to me like in current git main systemd basically hardcodes the default (compiled in) timeout when running generators: https://github.com/systemd/systemd/blob/a3680a45d0356ff3ee40dcf1d697326497a3949c/src/core/manager.c#L4123
So I'm not sure there's much we can do here; you'll need to file this with systemd; so closing here. (But without prejudice, feel free to reopen if you disagree)
Probably in cases like this they may need to tune down parallelism, or support longer timeouts. It's an IMO open question whether it's better to have a system with a broken generator just fail to boot, or boot but in a potentially undefined state.
@cgwalters Thanks for your reply. In our case, the underlaying volume i/o delay reach seconds level. Maybe we should fix the disk issue first 🤣. But in my opinion, OSTree should break the startup process when state root was not mount correctly. So we made a hack systemd service like below to break startup process and restart if state root not mounted:
[Unit]
Description=Check OSTree stateroot mount
After=ostree-remount.service
OnFailure=reboot.target
OnFailureJobMode=replace-irreversibly
[Service]
Type=oneshot
RemainAfterExit=yes
ExecStart=/bin/systemctl status var.mount
[Install]
WantedBy=local-fs.target
But in my opinion, OSTree should break the startup process when state root was not mount correctly.
OK hmmm...we can try to investigate this at some point. But a question: do you just not have many other generators?
Or maybe the real thing we should do is change things so that /var
is always mounted to the stateroot in the initramfs, and the fstab generator defaults to detecting that and unmounting if we have a real version.
Yes! The main issue is the stateroot mounting problem which cause the service (like containerd) wrote to the wrong place and lost the data in next reboot. Reboot just a mitigation for stateroot not being mounted.
v2022.2
v239
What happened?
We use
ostree
andrpm-ostree
built aCoreOS-like
OS on top ofRocky
.If the disk IO very slow when boot up. Systemd generator will timeout and terminated by an internal ALRM signal.
The
var.mount
unit will not be generated. So the stateroot not be mounted.How to reproduce?
inject a latency to disk just before
Switch root