when use loadrunner to give press on openshift-router, every several minutes, the request will got 200-300 error.

JamesJiang1024 commented 8 years ago

I am a openshift origin user, use v1.1.6 to prove of concept, but i found a problem, when use loadrunner to give press on openshift-router, every several minutes, the request will got 200-300 error. I change 2 parameters but that it seems does not work, resync_interval and reload_interval. It's a bit hard for me to find the code judge when to restart and resync haproxy which seems got bad effect on request flow.

Version

v 1.1.6

Steps To Reproduce

Run a Simple Web App, like hello world In cluster
Use LoadRunner to test that app
Current Result
every 10min there is some error occured
Expected Result
there no error, always 200
Additional Information [The router log when error occured]

ha-router-logs.txt

roldancer commented 8 years ago

Hi, We have +/- 3% of errors (http code 503 + SSL Handshake errors) in our OSE's routers, we have more than 500 pods deployed, we are doing some troubleshooting.

knobunc commented 8 years ago

This fix https://bugzilla.redhat.com/show_bug.cgi?id=1320233 drastically reduces the number of reloads. Before it would reload periodically, even if there were no changes. Now it only reloads when there are changes.

The reason there are drops is because haproxy uses the PORT_REUSE flag on the socket to do the reload. There is a kernel bug that sometime packets can get dropped if they get sent to the old process, but not consumed before it terminates. Eventually that will get fixed.

There is a workaround to install an iptables rule, if needed. https://github.com/openshift/openshift-docs/pull/1987

But that is somewhat involved, and probably not necessary. The first fix usually resolves the problem.

openshift / origin

when use loadrunner to give press on openshift-router, every several minutes, the request will got 200-300 error. #8773

Version

Steps To Reproduce

Current Result

Expected Result

Additional Information [The router log when error occured]