Closed dcbw closed 8 years ago
How does it get called from multiple goroutines?
Given that the bz bug mentions a HA setup, I assumed the problem was that we were running the code on multiple masters at once... (though I had thought someone convinced me at one point that that couldn't actually happen, for some reason that I don't remember).
I do not understand why we need this change? watchNodes() waits for node events and these events are processed serially and may call addNode()/deleteNode() which in turn will use subnet allocator to allocate/release subnetIP. Synchronization is implicitly provided by the event queue.
Is the HA setup right? RunSDNController() as part of openshift startControllers should be run on only one of the HA masters otherwise we will have race conditions, even this mutex lock won't help as this will provide synchronization within a single master but not across masters.
Is the HA setup right? RunSDNController() as part of openshift startControllers should be run on only one of the HA masters otherwise we will have race conditions, even this mutex lock won't help as this will provide synchronization within a single master but not across masters.
You are correct; I missed the HA angle here and the reporter is definitely running HA. Doesn't the requirement of a single SDN master kinda subvert HA though? Do we need to modify openshift-ansible to configure only the first master with the SDN plugin?
The subnet allocator's GetNetwork() function (which allocates and returns a subnet for a node) can be called from multiple goroutines and thus needs locking to prevent duplicate subnet allocations.