Open jamesbibby opened 5 years ago
One additional note, restarting docker resolves the issue (temporarily I think but will confirm).
After running a sudo docker service restart
containers can be scheduled again.
do you run a clean job which might removes important weave volumes? afaik dc/os introduced a cleaner in one of it's versions.
What you expected to happen?
Marathon/Mesos is able to schedule containers on servers on servers running weave 2.5.0.
What happened?
After about 24 hours of operation, containers fail to start containers on any hosts running weave 2.5.0
How to reproduce it?
We are running a services cluster on AWS withsome older versions of mesos/marathon/weave/docker:
Ubuntu: Trusty (Kernel 4.4.0) Marathon 0.11.1 Mesos: 0.25.0 Docker 1.9.1-0~trusty Weave: 1.4.2 Registrator: Snapshotted from Jan 2016.
I am in the process of upgrading the cluster with new(er) versions of Docker, Weave and Registrator: Docker: 1.11.2-0~trusty Weave: 2.5.0 (using the proxy) Registrator: v7
I tested things in a dev cluster and the upgrade worked in place with the following process: disable mesos slave process, upgrade and restart docker, upgrade, setup and restart weave, upgrade registrator. When I started canarying the upgrade in production it failed after about 24 hours with the following error message:
From my reading this means that the weave wait volume is no longer found and the entry point can no longer be accessed in the container, but I would like confirmation of this.
I created a dev container with one mesos slave upgraded (and the rest of the cluster running the original versions) and used it for about 24 hours (suspending and restarting applications periodically). I have been able to get the cluster into the broken state and would like help in debugging/resolving this issue.
Anything else we need to know?
Versions:
Logs:
From what I can see in the logs weave shut down at some point and errors seemed to start after this occurred:
I can provide full logs include before and after the upgrade if this is helpful.
Network: