rancher / rancher

Complete container management platform
http://rancher.com
Apache License 2.0
23.4k stars 2.97k forks source link

Updating network services to v0.2.1 prevents certain containers from starting #9148

Closed JoelESvensson closed 6 years ago

JoelESvensson commented 7 years ago

Rancher versions: rancher/server: 1.6.2 rancher/agent: 1.2.2

Infrastructure Stack versions: healthcheck: v0.3.0 ipsec: v0.1.0 network-services: v0.2.1 scheduler: v0.6.0

Docker version: 1.12 (docker version,docker info preferred)

**Operating system and kernel: CoreOS latest stable

**Type/provider of hosts: Openstack

**Setup details: Internal DB

**Environment Template: Cattle

Steps to Reproduce:

  1. Update network-services to v0.2.1 from v0.2.0
  2. Restart server

Results: Some containers won't start. More specifically all containers that have its starting command prefixed with /.r/r

Temporary fix: Simply rollback to v0.2.0 and everything will start as normal

soumyalj commented 7 years ago

Tested with v1.6.2 using CoreOS 1409.6.0 (4.11.9) server and hosts from digitalocean provider. Upgraded network services from v0.2.0 to v0.2.1. Restarted the server. All containers are running and did not see any issues. Containers on my host:

core@soumyacoreosbugtest-02 ~ $ docker ps -a
CONTAINER ID        IMAGE                            COMMAND                  CREATED             STATUS              PORTS               NAMES
6f5c9091c666        rancher/scheduler:v0.8.2         "/.r/r /rancher-entry"   4 minutes ago       Up 4 minutes                            r-scheduler-scheduler-1-d7a89b50
a92365c3e6f3        rancher/dns:v0.15.1              "/rancher-entrypoint."   8 minutes ago       Up 8 minutes                            r-network-services-metadata-dns-2-d6526dd8
2018610e97f1        rancher/network-manager:v0.7.1   "/rancher-entrypoint."   8 minutes ago       Up 8 minutes                            r-network-services-network-manager-2-ec697b53
82ec1190c177        nginx                            "/.r/r nginx -g 'daem"   11 minutes ago      Up 11 minutes                           r-Default-test1-1-050a02f9
a02df804f23c        nginx                            "/.r/r nginx -g 'daem"   11 minutes ago      Up 11 minutes                           r-Default-test1-2-ef7daaa3
86209c35b5e9        rancher/net:v0.11.2              "/rancher-entrypoint."   20 minutes ago      Up 20 minutes                           r-ipsec-ipsec-router-2-a0c91295
958e2a6bf63b        rancher/healthcheck:v0.3.1       "/.r/r /rancher-entry"   20 minutes ago      Up 20 minutes                           r-healthcheck-healthcheck-2-b7968179
a36fd5e90466        rancher/net:holder               "/.r/r /rancher-entry"   20 minutes ago      Up 20 minutes                           r-ipsec-ipsec-2-45813bca
32591a0faa4a        rancher/net:v0.11.2              "/rancher-entrypoint."   20 minutes ago      Up 20 minutes                           r-ipsec-ipsec-cni-driver-2-848158bf
ab18baed313b        rancher/metadata:v0.9.1          "/rancher-entrypoint."   20 minutes ago      Up 20 minutes                           r-network-services-metadata-2-b2240bed
c736fd197e6b        rancher/agent:v1.2.2             "/run.sh run"            21 minutes ago      Up 21 minutes                           rancher-agent
deniseschannon commented 7 years ago

@JoelESvensson DO you know which containers that wouldn't restart or an example of a service that we can test with? Or which specifically didn't restart?

JoelESvensson commented 7 years ago

This happens every time I try upgrade so I wonder what could be different for us. Basically it's all containers whose COMMAND is prefixed with /.r./r. No log output is given whatsoever.

JoelESvensson commented 7 years ago

What does /.r./r do anyway? Is there any way I can debug this and find out why /.r./r is crashing?

loganhz commented 6 years ago

Please let us know in a comment if you can still reproduce the issue, and we'll reopen it