Open mo-gr opened 7 years ago
Related or duplicate of #202
@mo-gr can you reproduce the problem on your stack? by setting the r106 to 100% and then r103 back to 1%?
@mo-gr it seems that the problem is the existence of the second LB for that stack.
https://github.com/zalando-stups/senza/blob/master/senza/traffic.py#L216
def get_stack_versions
function is catching 2 LBs and the results are becoming really crazy.
We need to find out what is the result of this function for your case with more then 1 LB.
Another question is how would the traffic be distributed in such a situation? Should we show all the possible endpoints? Can you provide some usecases for setting up another LB for the stack so that I can understand why it is needed?
@valgog to answer your first question: yes, it is reproducible with the same behaviour.
The second LB was/is part of some debugging. We don't really need it.
But still, I would prefer senza traffic to fail politely in such s situation than go totally bananas on the routes :)
@valgog @mo-gr the second ELB should not be a problem, i.e. multiple domains or LBs should be handled by Senza individually and traffic switching should be done the same for each domain or LB.
We have another use case for this in Plan B where we deploy different LBs with different SSL certs. I would simply expect Senza to do the same operations on both LBs.
The following behaviour is very surprising:
Expected behaviour: remove 1% from skipper-r106 and add 1% to skipper-r103.
Actual behaviour: Weird juggling of traffic: adding 1.5% to r103, removing over 50% of traffic from the live stack while adding 24% back and putting some traffic to a completely different version potentially causing havoc, terror and sadness to our applications.