Closed JamesJiang1024 closed 8 years ago
Hi, We have +/- 3% of errors (http code 503 + SSL Handshake errors) in our OSE's routers, we have more than 500 pods deployed, we are doing some troubleshooting.
This fix https://bugzilla.redhat.com/show_bug.cgi?id=1320233 drastically reduces the number of reloads. Before it would reload periodically, even if there were no changes. Now it only reloads when there are changes.
The reason there are drops is because haproxy uses the PORT_REUSE flag on the socket to do the reload. There is a kernel bug that sometime packets can get dropped if they get sent to the old process, but not consumed before it terminates. Eventually that will get fixed.
There is a workaround to install an iptables rule, if needed. https://github.com/openshift/openshift-docs/pull/1987
But that is somewhat involved, and probably not necessary. The first fix usually resolves the problem.
I am a openshift origin user, use v1.1.6 to prove of concept, but i found a problem, when use loadrunner to give press on openshift-router, every several minutes, the request will got 200-300 error. I change 2 parameters but that it seems does not work, resync_interval and reload_interval. It's a bit hard for me to find the code judge when to restart and resync haproxy which seems got bad effect on request flow.
Version
v 1.1.6
Steps To Reproduce
Current Result
Expected Result
Additional Information [The router log when error occured]
ha-router-logs.txt