Closed dcbw closed 8 years ago
Looks good, although should this be validated by the node rather than the master? I guess if the master fails then the node won't ever be able to finish starting up, so having the master validate it will ensure that the node doesn't create the routing loop anyway.
Yes, This will handle node IP conflicts with cluster network for exiting subnets but we also need to handle one more case. We allow node IP changes. We watch for this change and when this happens, we recreate the subnet. New subnet creation in this case should be only allowed if the new node IP doesn't conflict with cluster network.
Fixed it up so that node events are now checked for validity too.
LGTM
should this be validated by the node rather than the master?
dcbw realized later that the answer to this is probably "yes", because if the master exits immediately when the data in etcd is invalid, then you can't use "oc" to fix it.
I chose to do two sides to this. The master should warn about invalid Node IPs and refuse to create HostSubnet objects for them. The node should exit if its address is bad, and should ignore invalid HostSubnets from other nodes so we don't create flow rules for them.
@danwinship @marun repushed; now the Registry caches the clusterNetwork so we can get rid of oc.clusterNetwork.
And now broken up into separate commits for easier reviewing...
@pravisankar fixed your comment, PTAL
LGTM. Have you run origin's tests with this code yet?
Rest of the changes, LGTM
Reworked to address Ravi's comments.
@pravisankar PTAL
LGTM
@danwinship in passes extended tests in DIND at least.
LGTM
If a HostSubnet is edited or the node's IP address is changed such that the NodeIP is contained within its own SubnetCIDR, a routing loop will be created in the OVS bridge and cause a kernel panic by exhausing kernel stack space.
Prevent the cluster from starting when this happens by ensuring that the NodeIP is not contained within the cluster network CIDR.
https://bugzilla.redhat.com/show_bug.cgi?id=1295486