openshift / openshift-sdn

Apache License 2.0
69 stars 63 forks source link

OpenShift node reboot #304

Closed chengchengmu closed 8 years ago

chengchengmu commented 8 years ago

Hi,

After a reboot of the openshift-node's machine, should sdn be configured correctly ?

In my case it seems not. Version used :

[root@openshift-master ~]# oc version
oc v3.1.0.4-16-g112fcc4
kubernetes v1.1.0-origin-1107-g4c8e6f4

A node that rebooted tries to contact a pod hosted on another node, and got no route to host :

[root@cm-chaos-openshift-rbox-node-fld6 ~]#  tracepath 10.1.1.8
 1?: [LOCALHOST]                                         pmtu 1500
 1:  cm-chaos-openshift-rbox-node-fld6.figaro.amadeus.net 3006.666ms !H
     Resume: pmtu 1500 

When I run debug.sh script, it says : Node cm-chaos-openshift-rbox-node-fld6.figaro.amadeus.net: the IP in OpenShift (172.17.42.1) does not match DNS/hosts (10.55.0.89)

By the way the node is considered as ready in the master, pods are scheduled on it.

Any idea of what went wrong ?

Thanks !

danwinship commented 8 years ago

When I run debug.sh script, it says : Node cm-chaos-openshift-rbox-node-fld6.figaro.amadeus.net: the IP in OpenShift (172.17.42.1) does not match DNS/hosts (10.55.0.89)

So where is that coming from?

chengchengmu commented 8 years ago

172.17.42.1 is the IP of the docker0 interface on the openshift master, so this message is weird. I confirm that the IP of the openshift node does not change after a reboot.

chengchengmu commented 8 years ago

I did more tests on 5 openshift nodes:

[root@openshift-master ~]# ./debug.sh 
Analyzing master
Node cm-chaos-openshift-rbox-node-9y91.figaro.amadeus.net: the IP in OpenShift (172.17.42.1) does not match DNS/hosts (10.55.0.90)
Node cm-chaos-openshift-rbox-node-fld6.figaro.amadeus.net: the IP in OpenShift (172.17.42.1) does not match DNS/hosts (10.55.0.89)
Node cm-chaos-openshift-rbox-node-scg9.figaro.amadeus.net: the IP in OpenShift (172.17.42.1) does not match DNS/hosts (10.55.0.70)
Node cm-chaos-openshift-rbox-node-wdxb.figaro.amadeus.net: the IP in OpenShift (172.17.42.1) does not match DNS/hosts (10.55.0.91)

Analyzing cm-chaos-openshift-rbox-node-868e.figaro.amadeus.net (10.55.0.41)

Analyzing cm-chaos-openshift-rbox-node-9y91.figaro.amadeus.net (172.17.42.1)
Could not find node name in /etc/origin/master/master-config.yaml

Analyzing cm-chaos-openshift-rbox-node-fld6.figaro.amadeus.net (172.17.42.1)
Could not find node name in /etc/origin/master/master-config.yaml

Analyzing cm-chaos-openshift-rbox-node-scg9.figaro.amadeus.net (172.17.42.1)
Could not find node name in /etc/origin/master/master-config.yaml

Analyzing cm-chaos-openshift-rbox-node-wdxb.figaro.amadeus.net (172.17.42.1)
Could not find node name in /etc/origin/master/master-config.yaml

So I can collect logs : openshift-sdn-debug.zip

chengchengmu commented 8 years ago

Root cause found by @david-martin and Jawed Khelil : At reboot time, the facter 2.4's ipaddress 's value is docker0 interface's IP. Problem should be solved by upgrading facter to 3.x

Closing it.

david-martin commented 8 years ago

@menren I did what now? This thread escapes me completely

chengchengmu commented 8 years ago

A certain Jose David Martin of Red Hat helped me to solve this issue. Sorry if it's not you. It was a blind guess ...

david-martin commented 8 years ago

Possibly this David Martin https://github.com/jdnieto