zalando-stups / senza

Deploy immutable application stacks and create and execute AWS CloudFormation templates in a sane way
https://pypi.python.org/pypi/stups-senza
Other
96 stars 71 forks source link

Unexpected behaviour when switching traffic #283

Open mo-gr opened 7 years ago

mo-gr commented 7 years ago

The following behaviour is very surprising:

senza traffic skipper r103 1
Calculating new weights..Changing given percentage from 1.0 to 1.5 because all other versions are already getting the possible minimum traffic
 OK
Stack Name│Version│Identifier  │Old Weight%│Delta│Compensation│New Weight%│Current
skipper    r103    skipper-r103         0.0   1.0          0.5         1.5 <       
skipper    r106    skipper-r106       100.0 -50.5         24.5        74.0         
skipper    r99     skipper-r99          0.0               24.5        24.5         
Setting weights for skipper-debug.pathfinder-staging.zalan.do., skipper.pathfinder-staging.zalan.do... OK

Expected behaviour: remove 1% from skipper-r106 and add 1% to skipper-r103.

Actual behaviour: Weird juggling of traffic: adding 1.5% to r103, removing over 50% of traffic from the live stack while adding 24% back and putting some traffic to a completely different version potentially causing havoc, terror and sadness to our applications.

hjacobs commented 7 years ago

Related or duplicate of #202

valgog commented 7 years ago

@mo-gr can you reproduce the problem on your stack? by setting the r106 to 100% and then r103 back to 1%?

valgog commented 7 years ago

@mo-gr it seems that the problem is the existence of the second LB for that stack.

https://github.com/zalando-stups/senza/blob/master/senza/traffic.py#L216 def get_stack_versions function is catching 2 LBs and the results are becoming really crazy.

We need to find out what is the result of this function for your case with more then 1 LB.

Another question is how would the traffic be distributed in such a situation? Should we show all the possible endpoints? Can you provide some usecases for setting up another LB for the stack so that I can understand why it is needed?

mo-gr commented 7 years ago

@valgog to answer your first question: yes, it is reproducible with the same behaviour.

The second LB was/is part of some debugging. We don't really need it.

But still, I would prefer senza traffic to fail politely in such s situation than go totally bananas on the routes :)

hjacobs commented 7 years ago

@valgog @mo-gr the second ELB should not be a problem, i.e. multiple domains or LBs should be handled by Senza individually and traffic switching should be done the same for each domain or LB.

We have another use case for this in Plan B where we deploy different LBs with different SSL certs. I would simply expect Senza to do the same operations on both LBs.