openshift / origin

Conformance test suite for OpenShift
http://www.openshift.org
Apache License 2.0
8.5k stars 4.7k forks source link

Fedora23 crash #5828

Closed chmouel closed 8 years ago

chmouel commented 9 years ago

I just installed a Fedora23 and have been trying to install origin using the ansible install. openshift-node wasn't starting due of docker crashing and this seems to be caused on origin-node starts :

[root@node1 ~]# rm -rf /var/lib/docker/*
[root@node1 ~]# systemctl start docker
[root@node1 ~]# systemctl status docker
* docker.service - Docker Application Container Engine
   Loaded: loaded (/usr/lib/systemd/system/docker.service; enabled; vendor preset: disabled)
  Drop-In: /usr/lib/systemd/system/docker.service.d
           `-docker-sdn-ovs.conf
   Active: active (running) since Tue 2015-11-10 15:56:18 UTC; 39s ago
     Docs: http://docs.docker.com
 Main PID: 1553 (docker)
   CGroup: /system.slice/docker.service
           `-1553 /usr/bin/docker daemon --insecure-registry=172.30.0.0/16

Nov 10 15:55:48 node1.local.openshift.chmouel.com docker[1553]: time="2015-11-10T15:55:48.402041818Z" level=info msg="Listening for HTTP on unix (/var/run/docker.sock)"
Nov 10 15:55:48 node1.local.openshift.chmouel.com docker[1553]: time="2015-11-10T15:55:48.983314645Z" level=error msg="WARNING: No --storage-opt dm.thinpooldev specified, using loopback; this configuration is strongly di...production use"
Nov 10 15:56:18 node1.local.openshift.chmouel.com docker[1553]: time="2015-11-10T15:56:18.657365877Z" level=info msg="Option DefaultDriver: bridge"
Nov 10 15:56:18 node1.local.openshift.chmouel.com docker[1553]: time="2015-11-10T15:56:18.658121898Z" level=info msg="Option DefaultNetwork: bridge"
Nov 10 15:56:18 node1.local.openshift.chmouel.com docker[1553]: time="2015-11-10T15:56:18.674334476Z" level=info msg="Firewalld running: false"
Nov 10 15:56:18 node1.local.openshift.chmouel.com docker[1553]: time="2015-11-10T15:56:18.791806067Z" level=info msg="Loading containers: start."
Nov 10 15:56:18 node1.local.openshift.chmouel.com docker[1553]: time="2015-11-10T15:56:18.792312914Z" level=info msg="Loading containers: done."
Nov 10 15:56:18 node1.local.openshift.chmouel.com docker[1553]: time="2015-11-10T15:56:18.792357467Z" level=info msg="Daemon has completed initialization"
Nov 10 15:56:18 node1.local.openshift.chmouel.com docker[1553]: time="2015-11-10T15:56:18.792390627Z" level=info msg="Docker daemon" commit=cc2d489-dirty execdriver=native-0.2 graphdriver=devicemapper version=1.8.2-fc23
Nov 10 15:56:18 node1.local.openshift.chmouel.com systemd[1]: Started Docker Application Container Engine.
Hint: Some lines were ellipsized, use -l to show in full.
[root@node1 ~]# docker ps
CONTAINER ID        IMAGE               COMMAND             CREATED             STATUS              PORTS               NAMES
[root@node1 ~]#

now starting origin-node :

[root@node1 ~]# systemctl start origin-node
Job for origin-node.service failed because the control process exited with error code. See "systemctl status origin-node.service" and "journalctl -xe" for details.

[root@node1 ~]# systemctl status origin-node
* origin-node.service - Origin Node
   Loaded: loaded (/usr/lib/systemd/system/origin-node.service; enabled; vendor preset: disabled)
  Drop-In: /usr/lib/systemd/system/origin-node.service.d
           `-openshift-sdn-ovs.conf
   Active: failed (Result: start-limit) since Tue 2015-11-10 15:59:08 UTC; 37s ago
     Docs: https://github.com/openshift/origin
  Process: 1716 ExecStart=/usr/bin/openshift start node --config=${CONFIG_FILE} $OPTIONS (code=exited, status=255)
 Main PID: 1716 (code=exited, status=255)

Nov 10 15:59:08 node1.local.openshift.chmouel.com systemd[1]: Failed to start Origin Node.
Nov 10 15:59:08 node1.local.openshift.chmouel.com systemd[1]: origin-node.service: Unit entered failed state.
Nov 10 15:59:08 node1.local.openshift.chmouel.com systemd[1]: origin-node.service: Failed with result 'exit-code'.
Nov 10 15:59:08 node1.local.openshift.chmouel.com origin-node[1716]: F1110 15:59:08.601430    1716 node.go:88] ERROR: Docker could not be reached at unix:///var/run/docker.sock.  Docker must be installed and running to start containers.
Nov 10 15:59:08 node1.local.openshift.chmouel.com origin-node[1716]: Get http://unix.sock/_ping: EOF
Nov 10 15:59:08 node1.local.openshift.chmouel.com systemd[1]: origin-node.service: Service hold-off time over, scheduling restart.
Nov 10 15:59:08 node1.local.openshift.chmouel.com systemd[1]: origin-node.service: Start request repeated too quickly.
Nov 10 15:59:08 node1.local.openshift.chmouel.com systemd[1]: Failed to start Origin Node.
Nov 10 15:59:08 node1.local.openshift.chmouel.com systemd[1]: origin-node.service: Unit entered failed state.
Nov 10 15:59:08 node1.local.openshift.chmouel.com systemd[1]: origin-node.service: Failed with result 'start-limit'.
[root@node1 ~]# systemctl status docker
* docker.service - Docker Application Container Engine
   Loaded: loaded (/usr/lib/systemd/system/docker.service; enabled; vendor preset: disabled)
  Drop-In: /usr/lib/systemd/system/docker.service.d
           `-docker-sdn-ovs.conf
   Active: active (running) since Tue 2015-11-10 15:56:18 UTC; 3min 0s ago
     Docs: http://docs.docker.com
 Main PID: 1553 (docker)
   CGroup: /system.slice/docker.service
           `-1553 /usr/bin/docker daemon --insecure-registry=172.30.0.0/16

Nov 10 15:59:08 node1.local.openshift.chmouel.com docker[1553]: github.com/gorilla/mux.(*Router).ServeHTTP(0xc820279130, 0x7f70e6319be0, 0xc8203f6790, 0xc82009c1c0)
Nov 10 15:59:08 node1.local.openshift.chmouel.com docker[1553]: /builddir/build/BUILD/docker-28c300fafb58c380d78381e08e1be35dfed5d4f9/vendor/src/github.com/gorilla/mux/mux.go:98 +0x29e fp=0xc820647be0 sp=0xc820647ac8
Nov 10 15:59:08 node1.local.openshift.chmouel.com docker[1553]: net/http.serverHandler.ServeHTTP(0xc820120d20, 0x7f70e6319be0, 0xc8203f6790, 0xc82009c1c0)
Nov 10 15:59:08 node1.local.openshift.chmouel.com docker[1553]: /usr/lib/golang/src/net/http/server.go:1862 +0x19e fp=0xc820647c40 sp=0xc820647be0
Nov 10 15:59:08 node1.local.openshift.chmouel.com docker[1553]: net/http.(*conn).serve(0xc82039e420)
Nov 10 15:59:08 node1.local.openshift.chmouel.com docker[1553]: /usr/lib/golang/src/net/http/server.go:1361 +0xbee fp=0xc820647f98 sp=0xc820647c40
Nov 10 15:59:08 node1.local.openshift.chmouel.com docker[1553]: runtime.goexit()
Nov 10 15:59:08 node1.local.openshift.chmouel.com docker[1553]: /usr/lib/golang/src/runtime/asm_amd64.s:1696 +0x1 fp=0xc820647fa0 sp=0xc820647f98
Nov 10 15:59:08 node1.local.openshift.chmouel.com docker[1553]: created by net/http.(*Server).Serve
Nov 10 15:59:08 node1.local.openshift.chmouel.com docker[1553]: /usr/lib/golang/src/net/http/server.go:1910 +0x3f6
[root@node1 ~]#

Full log available here :

http://paste.openstack.org/show/478454/

there is a weird :

Nov 10 15:59:05 node1.local.openshift.chmouel.com docker[1553]: 2015/11/10 15:59:05 http: panic serving @: runtime error: invalid memory address or nil pointer dereference

in there as well and this has been reproduced on my different nodes!

detiber commented 9 years ago

@danmcp We are already tracking this issue on the openshift-ansible repo: https://github.com/openshift/openshift-ansible/issues/855 and have a PR to address it here: https://github.com/openshift/openshift-ansible/pull/966

brenton commented 8 years ago

Looks like this was resolved in the ansible playbooks.