Open paulvt opened 6 years ago
If there are more than two deployments and the current deployment is not either of the first two
There's almost nothing inside libostree that special cases around "two". In fact for rpm-ostree when doing live updates we end up with 3, and that works. This issue is a lot more likely to be the bootloader configuration being out of sync with the deployment roots.
If something for example rolled back changed the bootloader entries behind libostree's back, that might cause this.
Clearly libostree could be more resilient here. The main reason we want to know if /
is a deployment is so we can detect the booted one. But in the end we can easily find that out via the "compare stat(/) vs stat(/ostree/deploy/...)" logic we have elsewhere.
So we probably could stumble forwards and allow most operations to work, which if my theory is correct, would end up regenerating the bootloader configuration and fix this.
For example, I can trivially reproduce this on e.g. Fedora Atomic Host:
[root@localhost ~]# mv /boot/loader{,.orig}
[root@localhost ~]# ostree admin status
error: Unexpected state: ostree= kernel argument found, but / is not a deployment root
[root@localhost ~]# mv /boot/loader.orig /boot/loader
[root@localhost ~]# ostree admin status
* fedora-atomic b5845ebd002b2ec829c937d68645400aa163e7265936b3e91734c6f33a510473.0
Version: 27.44
origin refspec: fedora-atomic:fedora/27/x86_64/atomic-host
GPG: Signature made Mon 01 Jan 2018 09:54:38 PM UTC using RSA key ID F55E7430F5282EE4
(re: being able to trivially mv /boot/loader
- see https://github.com/ostreedev/ostree/issues/1265 )
The logic you refer to is actually just before the line that throws the exception. However, I hadn't realised it indeed starts out by a list to check that is constructed from stuff it finds in /boot.
It seems that for our installation /boot is indeed out of sync with the actual deployment.
If there are more than two deployments and the current deployment is not either of the first two, the admin command exits with the error "error: Unexpected state: ostree= kernel argument found, but / is not a deployment root".
A device we had ended up in a state having of three deployments, because the upgrade job was (accidentally) killed near the end. The culprit seems to be this exception that is thrown early in the command execution. Debugging revealed that it only considered the first two deployments and not the third, which is actually the current deployment. As a result, the admin cannot be used to rectify the situation.