openshift / openshift-sdn

Apache License 2.0
69 stars 63 forks source link

Don't loop forever if subnet not found for the node #204

Closed pravisankar closed 8 years ago

pravisankar commented 8 years ago

Fix related to https://bugzilla.redhat.com/show_bug.cgi?id=1194467

pravisankar commented 8 years ago

@danwinship @dcbw PTAL

dcbw commented 8 years ago

LGTM

sdodson commented 8 years ago

The implications of this are that you loop for 10 seconds and then what happens? openshift exits in an error state or does it go on to mark itself as ready but not on the SDN? Sorry if this is a dumb question.

danwinship commented 8 years ago

it will exit with an error

danwinship commented 8 years ago

maybe we need to distinguish "master is not running" (in which case keep waiting [but also handle SIGTERM]) from "master is running but it's not assigning a subnet to this host"

sdodson commented 8 years ago

FWIW, the systemd service for the node is Restart=always (perhaps we should change that to on-failure?) so systemd will restart the node indefinitely. This is mainly because it's a distributed system and if you reboot an entire environment there's no guarantee that the master is up before your node tries to start.

pravisankar commented 8 years ago

'Master is not running' message will be helpful but I don't think we want to keep looping in any case. There might be a bug in allocating subnet in master or subnet entry no longer exists in etcd or api call failed with other reasons (network, auth...). Even if the master is not up before node, I think it's okay for the node to fail and restart few times.