Closed r0bj closed 8 years ago
@r0bj for high performance, I recommend to set a higher value for MaxIdleConnsPerHost
:
# If non-zero, controls the maximum idle (keep-alive) to keep per-host. If zero, DefaultMaxIdleConnsPerHost is used.
# If you encounter 'too many open files' errors, you can either change this value, or change `ulimit` value.
#
# Optional
# Default: http.DefaultMaxIdleConnsPerHost
#
# MaxIdleConnsPerHost = 200
Can you try with this change (for example 1000)?
Changing MaxIdleConnsPerHost value to 1000 didn't fix the issue. Daemon stop handling traffic.
Just to be clear, traefik stays in freeze state even after wrk
benchmark is over.
Hum, this is really weird... There must be a regression here. I will investigate on this.
I cannot reproduce this issue. I used your toml file and launched several times wrk -t30 -c400 -d30 -H "Host:test-nginx.example.net" http://localhost:8000/
but I still don't have any error... But Iused traefik master version. I will try with v.1.0.2 later. Maybe you could try disabling retry
in the toml file?
I've disabled retry
but result was the same.
@r0bj I still cannot reproduce this issue on my laptop, even using v1.0.2 :confused: Traefik continues to respond after benchmarks. Which architecture are you using? amd64? Are you on bare metal or on a VM? @containous/traefik can anyone try to reproduce this at home?
I hit this issue on bare metal amd64 host. Then I try to reproduce this on vagrant/virtual box VM (vanilla ubuntu/trusty64) with success.
I am having the same issue, I end up writing a script which check Traefik
log files and if the log file is older than 3 min it reloads Traefik
.
@abhishekamralkar Which version are you using? Is it something new? As I still cannot reproduce this behavior, I need as much information as possible to investigate :)
I can confirm this with the latest release of traefik using wrk -t40 -c400 -d30 http://...
using e.g. wrk -t10 -c40 -d30 http://... works still fine.
I was able to reproduce this on 2 physical machines with Ubuntu 16.04 running Traefik in Docker 1.12.1
As soon as the error orrurs Traefik stops reacting - even after wrk is finised. In the logs of traefik the following appears:
...
time="2016-09-19T20:50:14Z" level=warning msg="Error forwarding to http://httpd:80, err: EOF"
time="2016-09-19T20:50:14Z" level=warning msg="Error forwarding to http://httpd:80, err: EOF"
time="2016-09-19T20:50:14Z" level=warning msg="Error forwarding to http://httpd:80, err: EOF"
time="2016-09-19T20:50:14Z" level=warning msg="Error forwarding to http://httpd:80, err: EOF"
time="2016-09-19T20:50:14Z" level=warning msg="Error forwarding to http://httpd:80, err: EOF"
http://httpd:80 is the backend that I have configured. Despite the error message, the backend is still there and reacts fine if I reach it manually:
# curl httpd
<html><body><h1>It works!</h1></body></html>
When I shutdown the traefik container I see:
time="2016-09-19T20:51:06Z" level=info msg="I have to go... terminated"
time="2016-09-19T20:51:06Z" level=info msg="Stopping server"
When I restart traefik everything workd fine again (I don't need to restart the httpd backend server)
Do you need any additional information?
traefik version
Version: v1.0.2
Codename: reblochon
Go version: go1.6.2
Built: 2016-08-02_05:29:50PM
OS/Arch: linux/amd64
This is the version I am using. Not sure whats going on but it just hanged and nothing works
I think I found the issue. This seems due to a race condition in https://github.com/thoas/stats. It produces when accessing to the /health
endpoint of traefik's web ui, and at the same time, making requests to traefik reverse proxy.
Could you confirm that you are accessing /health
during your tests (with a healthcheck or if you have the web ui opened) ?
A workaround is to avoid accessing webui during tests and change your healthcheck to /api
endpoint.
I'm investigating if the issue is still present in the master branch.
@emilevauge Yes, indeed that has been the case for me. I had the health UI check opened during the tests all the time.
@emilevauge Yes, It seems that that was the case. I was accessing /health during the wrk test - this was actually a way of determinig if traefik is still working. And also I pointed marathon healtcheck to traefik /health.
We are experiencing similar behavior. We see this despite our healthcheck using the /api
endpoint.
@ryanleary then could you give as much details as possible?
@emilevauge I did some load tests and I can confirm that the health UI was indeed the problem - at least in my case. I can reproduce the problem many times when the health UI is opened but when it's closed, I cannot reproduce the problem anymore - no matter what I try.
@emilevauge Thanks a lot. Do you know when the next traefik release will be out, which has this fix included?
Already released: https://github.com/containous/traefik/releases/tag/v1.0.3 :)
Just waiting for docker to merge: https://github.com/docker-library/official-images/pull/2169.
You can use containous/traefik:v1.0.3
in the meantime.
Perfect, thanks a lot. :-)
Thanks. I can confirm that this issue is no longer valid with v1.0.3.
traefik version: 1.0.2
Set open files limit to 1000000:
ulimit -n 1000000
traefik config:
After start everything works:
Then lets test it with
wrk
:wrk -t30 -c400 -d30s -H "Host: test-nginx.example.net" http://localhost:8000
After some time (one or few attempts) traefik is unresponsive on port 8000:
So traffic is no longer processed.
Health API is also unresponsive:
Whats interesting dashboard is responsive (but without data):
Traefik access logs just stop writing:
Logs (severity DEBUG) shows in log file all the time even for requests without reply:
strace during attempt to send request to traefik (
curl -svo /dev/null -H "Host:test-nginx.example.net" http://localhost:8000
): https://gist.github.com/r0bj/b618c74b1bc0db5c11f78db08c34fc15So it seems that request hits backend but response isn't sent to original sender.
There are many connections in
CLOSE_WAIT
status: https://gist.github.com/r0bj/c647c76fe65a562ffd2e024e11a260cdRestart treafik daemon fixes this issue.
It's easy to replicate this issue:
ubuntu/trusty64
image with default settingswrk
benchmark one or more timesOne can also replicate this issue with sending
wrk
requests to non existing backend (resulting 404):wrk -t30 -c400 -d30 http://localhost:8000/