Last week when we update a core service in our production environment(build with DC/OS). we accidentally make a mistake when change the health check configuration. and we get 503 return all the time from external access until we make health check configuration correctly and restart service . the old instance state is always healthy in marathon page. so we think something happened when marathon-lb reload.
why old healthy instance lose efficacy after we make a bad health check ?As we know nothing changed with old healthy instance when we lunch a new unhealthy instance in same application.
Test and Verification(marathon-lb version 1.12.1)
a new nginx(listen 80) test application lunched(health check port 80)
change health check port to 81 (marathon lunch a new instance and its state will never be healthy, at this time the nginx backend in haproxy.cfg has two different server)
test external access
haproxy.cfg
before reload
backend nginx-lbl-test_10278
balance roundrobin
mode http
option forwardfor
http-request set-header X-Forwarded-Port %[dst_port]
http-request add-header X-Forwarded-Proto https if { ssl_fc }
server 10_168_0_82_9_0_5_7_80 9.0.5.7:80 check inter 5s fall 4 port 80
after reload
backend nginx-lbl-test_10278
balance roundrobin
mode http
option forwardfor
http-request set-header X-Forwarded-Port %[dst_port]
http-request add-header X-Forwarded-Proto https if { ssl_fc }
server 10_168_0_82_9_0_5_7_80 9.0.5.7:80 check inter 5s fall 4 port 81
server 10_168_0_82_9_0_5_12_80 9.0.5.12:80 check inter 5s fall 4 port 81
so why old instance health check configuration also has been updated?
It's terrible when we update some application in production environment. haproxy failover lose efficacy when you make a bad health check even the old healthy instance is still alive.
Last week when we update a core service in our production environment(build with DC/OS). we accidentally make a mistake when change the health check configuration. and we get 503 return all the time from external access until we make health check configuration correctly and restart service . the old instance state is always healthy in marathon page. so we think something happened when marathon-lb reload.
why old healthy instance lose efficacy after we make a bad health check ?As we know nothing changed with old healthy instance when we lunch a new unhealthy instance in same application.
Test and Verification(marathon-lb version 1.12.1)
haproxy.cfg
before reload
after reload
so why old instance health check configuration also has been updated?
It's terrible when we update some application in production environment. haproxy failover lose efficacy when you make a bad health check even the old healthy instance is still alive.