ostreedev / ostree

Operating system and container binary deployment and upgrades
https://ostreedev.github.io/ostree/
Other
1.27k stars 291 forks source link

Admin command somtimes becomes useless when more than two deployments exist #1394

Open paulvt opened 6 years ago

paulvt commented 6 years ago

If there are more than two deployments and the current deployment is not either of the first two, the admin command exits with the error "error: Unexpected state: ostree= kernel argument found, but / is not a deployment root".

A device we had ended up in a state having of three deployments, because the upgrade job was (accidentally) killed near the end. The culprit seems to be this exception that is thrown early in the command execution. Debugging revealed that it only considered the first two deployments and not the third, which is actually the current deployment. As a result, the admin cannot be used to rectify the situation.

cgwalters commented 6 years ago

If there are more than two deployments and the current deployment is not either of the first two

There's almost nothing inside libostree that special cases around "two". In fact for rpm-ostree when doing live updates we end up with 3, and that works. This issue is a lot more likely to be the bootloader configuration being out of sync with the deployment roots.

If something for example rolled back changed the bootloader entries behind libostree's back, that might cause this.

cgwalters commented 6 years ago

Clearly libostree could be more resilient here. The main reason we want to know if / is a deployment is so we can detect the booted one. But in the end we can easily find that out via the "compare stat(/) vs stat(/ostree/deploy/...)" logic we have elsewhere.

So we probably could stumble forwards and allow most operations to work, which if my theory is correct, would end up regenerating the bootloader configuration and fix this.

For example, I can trivially reproduce this on e.g. Fedora Atomic Host:

[root@localhost ~]# mv /boot/loader{,.orig}
[root@localhost ~]# ostree admin status
error: Unexpected state: ostree= kernel argument found, but / is not a deployment root
[root@localhost ~]# mv /boot/loader.orig /boot/loader
[root@localhost ~]# ostree admin status
* fedora-atomic b5845ebd002b2ec829c937d68645400aa163e7265936b3e91734c6f33a510473.0
    Version: 27.44
    origin refspec: fedora-atomic:fedora/27/x86_64/atomic-host
    GPG: Signature made Mon 01 Jan 2018 09:54:38 PM UTC using RSA key ID F55E7430F5282EE4
cgwalters commented 6 years ago

(re: being able to trivially mv /boot/loader - see https://github.com/ostreedev/ostree/issues/1265 )

paulvt commented 6 years ago

The logic you refer to is actually just before the line that throws the exception. However, I hadn't realised it indeed starts out by a list to check that is constructed from stuff it finds in /boot.

It seems that for our installation /boot is indeed out of sync with the actual deployment.