Open maartenl945 opened 6 years ago
I think problem is with the upgrade. Upgrade should not have overwritten /etc/sysconfig/docker-storage-setup. It should write STORAGE_DRIVER=overlay2 only if this file is not present.
@cgwalters Any thoughts on what's going on.
cc @rhatdan
Yeah, I have a variant of this on my workstation too;
# rpm-ostree status
State: idle; auto updates disabled
● ostree://fedora-27:fedora/27/x86_64/workstation
Version: 27.115 (2018-04-18 23:49:03)
BaseCommit: 0b2806f9eebb92d4623a11c06162e59805d8b277b37789b8d94ed2cb67af10a8
# systemctl status docker-storage-setup
● docker-storage-setup.service - Docker Storage Setup
Loaded: loaded (/usr/lib/systemd/system/docker-storage-setup.service; disabled; vendor preset: disabled)
Active: failed (Result: exit-code) since Thu 2018-04-19 13:10:24 EDT; 4 days ago
Process: 1097 ExecStart=/usr/bin/container-storage-setup (code=exited, status=1/FAILURE)
Main PID: 1097 (code=exited, status=1/FAILURE)
Apr 19 13:10:24 vanadium systemd[1]: Starting Docker Storage Setup...
Apr 19 13:10:24 vanadium container-storage-setup[1097]: ERROR: Storage is already configured with overlay driver. Can't configure it with overlay2 driver. To override, remove /etc/sysconfig/docker-storage and re
Apr 19 13:10:24 vanadium systemd[1]: docker-storage-setup.service: Main process exited, code=exited, status=1/FAILURE
Apr 19 13:10:24 vanadium systemd[1]: Failed to start Docker Storage Setup.
Apr 19 13:10:24 vanadium systemd[1]: docker-storage-setup.service: Unit entered failed state.
Apr 19 13:10:24 vanadium systemd[1]: docker-storage-setup.service: Failed with result 'exit-code'.
Which IIRC I upgraded from F26. So...we obviously talked about this scenario in the original PR: https://github.com/projectatomic/container-storage-setup/pull/263#issuecomment-344010791
Rereading that I'm not certain we actually found a solution to that problem?
You know in general I sort of feel like c-s-s should just be a "run at most once" service, like we drop a stamp file in /var/lib/docker/css.stamp
and do ConditionPathExists=!/var/lib/docker/css.stamp
. Or maybe even just ConditionDirectoryNotEmpty=!/var/lib/docker
?
To give my answers to the questions:
Should container-storage-setup really generate an error when there is an existing storage configuration or is this a bug ?
It's a bug.
Can we safely run with the existing storage configuration after update (even though docker storage setup failed) or is there something which is then missing ?
Yes, it's safe.
@cgwalters @rhvgoyal Thanks for the answers
Upgrade should not have overwritten /etc/sysconfig/docker-storage-setup. It should write STORAGE_DRIVER=overlay2 only if this file is not present.
I haven't fully analyzed this but here's what I think is happening. Today in Fedora docker spec we mark it as %config(noreplace)
. Which would ordinarily still mean we get new config files if we haven't touched the file.
(ostree-managed systems are basically %config(noreplace)
universally for all of /etc
)
Now the problem comes in that the docker package also has a crazy posttrans which overrides its own config file - hence making it modified. So on "classic" systems managed with yum
, we basically will never see updates to the config. But we will for ostree-managed ones.
To rephrase this: a very important detail here being that with "classic" systems, %posttrans
is run on the client, and with rpm-ostree managed ones, it's run on the compose server. The %posttrans
also basically implements "run just once" semantics - but that only matters on client systems. With rpm-ostree we generate a new tree each time, with the resulting data going into /usr/etc/sysconfig/docker-storage-setup
, which is taken as the new default.
To say it another way: what the docker
package is doing today subverts what RPM normally does with config files.
Actually this is all fixed since https://github.com/projectatomic/container-storage-setup/commit/ec391a16f72b4dc383a5e005f72d846c18023dc5
But we haven't actually e.g. shipped any updates to c-s-s in Fedora 28 since https://src.fedoraproject.org/rpms/docker/blob/901d9fb5aba12bde899b8be1b915b4a60a4f9afc/f/docker.spec#_52 which is...ancient.
No wait, that's dead code since we split off c-s-s into its own package. But we're still waiting on https://bodhi.fedoraproject.org/updates/FEDORA-2018-03bdc0733a
Without going into specific details, so how %config(noreplace) equivalent is implemented, in atomic image upgrades. IOW, if a user decided to edit a file in /etc/sysconfig/ which now should not be replaced over upgrade, how atomic upgrade takes care of that?
Ok, in fedora docker spec, /etc/sysconfig/docker-storage-setup is generated on the fly (only if file is not present already). And it is marked as %ghost file. That is package does not install it but if package is un-installed, then this file will be removed.
This means that over upgrade of docker package, this fill will not be touched at all. Only during fresh install, it will be created as part of %posttrans. That also means, any changes user/other scripts have made, will continue to be there over upgrade.
How does atomic upgrade take care of this situation. That is user made changes are not lost over upgrade for a file present in /etc/
To me it does not matter if changes to /etc/sysconfig/docker-storage-setup are done in %posttrnas, or manually by user or by an external script or by cloud-config. Atomic has to deal with it over upgrade.
IOW, if a user decided to edit a file in /etc/sysconfig/ which now should not be replaced over upgrade, how atomic upgrade takes care of that?
It's implemented in libostree. There's some mention of this in https://ostree.readthedocs.io/en/latest/manual/atomic-upgrades/ Basically, modified config files are always preserved.
Does it matter though now that we discovered we'd already fixed the root issue here? yum-managed systems will stay with the default config file, but that seems OK since we want them to also stay with devicemapper anyways by default.
container-storage-setup fix is more of a workaround. We should not even be getting to that point if our assumptions w.r.t upgrades are correct.
I am trying to understand that why did we end up in this situation at all over upgrade. That is not clear to me yet. My guess is that atomic overwrote /etc/sysconfig/docker-storage-setup and that's a problem. We have not addressed the real problem.
Why did storage driver change to begin with over upgrade, that's the real problem which needs fixing. How container-storage-setup copes with the change, is a different issue altogether.
My guess is that atomic overwrote /etc/sysconfig/docker-storage-setup and that's a problem.
I don't agree, getting updated config files should be the default. What's happening in the docker.spec
today is just confusing and hard to explain to admins.
Documentation says following.
"Now that we have a deployment directory, a 3-way merge is performed between the (by default) currently booted deployment's /etc, its default configuration, and the new deployment (based on its /usr/etc)."
I am not sure what does it mean. Why /etc/ needs to be touched by upgrade at all. That's host detail and should probably be touched only during install. Any upgrades should go to files in /usr/... dir.
I mean if a user installed atomic images with some custom changes in /etc/sysconfig/foo.txt using cloud-init, how would you retain these changes over upgrade if you decide to override these files.
How does it matter if /etc/sysconfig/docker-storage-setup was touched by %posttrans. It could have very well be done by cloud-init. I really can't see anything wrong with what's being done in %posttrans. That section is there so that user can do system specific things.
IMO, real problem is that atomic upgrades are modifying a file over upgrade, which they should not be modifying.
I think real issue is that atomic upgrade will not deal with any file being dropped by %posttrans section in /etc. We drop this file to setup default for the system and it is marked as "%ghost". So while it works well for rpms, it seems to become an issue with atomic.
I am not sure how to setup a default on a system dynamically during package installation if I don't use %posttrans section. Dropping config files only works for static defaults.
I am not sure how to setup a default on a system dynamically during package installation if I don't use %posttrans section.
Do it at service startup via systemd. If you think about it, that's what we're effectively doing today as state is also stored in /etc/sysconfig/docker-storage
and /var/lib/docker
that are populated on service start.
We had gone towards that path with the option of STORAGE_DRIVER=auto and ran into various issues. Also, that means we will need to carry additional patches to source code to change fedora/redhat defaults. (if they are different from upstream default. For example, upstream default is devicemapper while we want overlay2 as default).
@cgwalters looks like exiting early is not enough. It exposes a race where we wait for thin pool to come up and now that wait does not happen and docker fails saying thin pool device does not exist.
@dustymabe is able to reproduce it.
We had fixed it using following commit.
https://github.com/projectatomic/container-storage-setup/pull/111
@dustymabe is able to reproduce it.
reference: https://pagure.io/atomic-wg/issue/498
OK, and we reintroduced that race condition because the "storage is changed" check is before the thin pool wait, and this only affects Atomic Host as we propagated the new default config file. Whee.
So... ?
diff --git a/container-storage-setup.sh b/container-storage-setup.sh
index d0575ba..1235354 100755
--- a/container-storage-setup.sh
+++ b/container-storage-setup.sh
@@ -1414,6 +1414,19 @@ setup_storage_compat() {
Fatal "Failed to determine existing storage driver."
fi
+ # If a user decides to setup (a) and (b)/(c):
+ # a) lvm thin pool for devicemapper.
+ # b) a separate volume for container runtime root.
+ # c) a separate named ($CONTAINER_ROOT_LV_NAME) volume for $CONTAINER_ROOT_LV_MOUNT_PATH.
+ # (a) will be setup first, followed by (b) or (c).
+ #
+ # This also involves waiting for the pool to come up;
+ # we need to do this to avoid race conditions with docker startup.
+ # https://github.com/projectatomic/container-storage-setup/issues/267
+ if [ "$STORAGE_DRIVER" == "devicemapper" ]; then
+ setup_lvm_thin_pool_compat
+ fi
+
# If storage is configured and new driver should match old one.
if [ -n "$current_driver" ] && [ "$current_driver" != "$STORAGE_DRIVER" ];then
Info "Storage is already configured with ${current_driver} driver. Can't configure it with ${STORAGE_DRIVER} driver. To override, remove ${_STORAGE_OUT_FILE} and retry."
@@ -1426,16 +1439,7 @@ setup_storage_compat() {
fi
fi
- # If a user decides to setup (a) and (b)/(c):
- # a) lvm thin pool for devicemapper.
- # b) a separate volume for container runtime root.
- # c) a separate named ($CONTAINER_ROOT_LV_NAME) volume for $CONTAINER_ROOT_LV_MOUNT_PATH.
- # (a) will be setup first, followed by (b) or (c).
-
- # Set up lvm thin pool LV.
- if [ "$STORAGE_DRIVER" == "devicemapper" ]; then
- setup_lvm_thin_pool_compat
- else
+ if [ "$STORAGE_DRIVER" != "devicemapper" ]; then
write_storage_config_file $STORAGE_DRIVER "$_STORAGE_OUT_FILE"
fi
I don't want to be calling setup_lvm_thin_pool, when I know /etc/sysconfig/docker-storage-setup has changed and it has lost valid options like AUTO_POOL_EXTEND=yes. That would disable auto extension.
I am working on a hack patch to just wait for thin pool to come up if storage driver is already configured and it has changed.
This also breaks the reset storage path. That path looks at /etc/sysconfig/docker-storage-setup to figure out storage driver and other options and reset accordingly.
Ok, I have proposed a PR to fix this corner case here.
https://github.com/projectatomic/container-storage-setup/pull/271
PTAL
Hi,
We are using Atomic Host CentOS on our systems in the field to run our docker application containers. We are also running cockpit to have easy access to system services and Atomic Host updates. We noticed when updating to the latest Atomic Host release (via cockpit), docker storage setup fails on reboot. systemctl diagnostics give the following errors:
`bash-4.2# systemctl status docker-storage-setup.service â docker-storage-setup.service - Docker Storage Setup Loaded: loaded (/usr/lib/systemd/system/docker-storage-setup.service; enabled; vendor preset: disabled) Active: failed (Result: exit-code) since Mon 2018-04-23 10:48:40 CEST; 42min ago Process: 859 ExecStart=/usr/bin/container-storage-setup (code=exited, status=1/FAILURE) Main PID: 859 (code=exited, status=1/FAILURE)
Apr 23 10:48:39 master systemd[1]: Starting Docker Storage Setup... Apr 23 10:48:40 master container-storage-setup[859]: ERROR: Storage is already configured with devicemapper driver. Can't configure it wi... retry. Apr 23 10:48:40 master systemd[1]: docker-storage-setup.service: main process exited, code=exited, status=1/FAILURE Apr 23 10:48:40 master systemd[1]: Failed to start Docker Storage Setup. Apr 23 10:48:40 master systemd[1]: Unit docker-storage-setup.service entered failed state. Apr 23 10:48:40 master systemd[1]: docker-storage-setup.service failed. Hint: Some lines were ellipsized, use -l to show in full. `
I think this is caused by the docker storage system switching to overlay2 FS by default ?
On a system BEFORE the Atomic Host update I notice that the file /etc/sysconfig/docker-storage-setup contains the configuration: CONTAINER_THINPOOL=docker-pool
On a system after the Atomic Host update I notice that the same file now contains the configuration: STORAGE_DRIVER=overlay2
There is already a docker storage setup on this system however, in /etc/sysconfig/docker-storage: DOCKER_STORAGE_OPTIONS="--storage-driver devicemapper --storage-opt dm.fs=xfs --storage-opt dm.thinpooldev=/dev/mapper/cah_master-docker--pool --storage-opt dm.use_deferred_removal=true --storage-opt dm.use_deferred_deletion=true "
My understanding is that this existing configuration is what container-storage-setup complains about during startup of the updated system.
I have two questions: