Closed ankitarya10 closed 7 years ago
I also utilized the debug feature for proxy, from the recent release to look at the logs.
2017/03/09 21:53:47 HAPRoxy: services framework_naip-be5000/framework_naip 0/0/0/2/2 200 163 - - ---- 2/2/0/1/0 0/0 {-,"",""} "GET /naip/api/1/naip/healthcheck HTTP/1.1"
2017/03/09 21:53:49 HAPRoxy: services services/
First one is healthcheck from ELB, not sure what others are.
To replicate the issue, I created a 3 node swarm cluster.
docker network create -d overlay proxy curl -o proxy.yml \ https://raw.githubusercontent.com/vfarcic/docker-flow-proxy/master/docker-compose-stack.yml
docker stack deploy -c proxy.yml proxy
docker pull ankitarya/webserver // Get YAML file from here: http://pastebin.com/2jjXxLAU docker stack deploy -c docker-compose-webserver.yml app
Ran two Tests:
Test1: Webserver -> Reverse Proxy https://a.blazemeter.com/app/?public-token=bUwxqKh45yfb5oTWe32322u5MZMVaMn0lw32DsrVPTXH3BDWYw#/accounts/130040/workspaces/123459/projects/167833/masters/16021634/summary
Test2: Webserver -> Reverse Proxy -> ELB https://a.blazemeter.com/app/?public-token=pKQMeZ3CACwl77zfk9Bw3rVg9DBaNOTb3TnhWPQnRdkkSmusXA#/accounts/130040/workspaces/123459/projects/168089/masters/16021693/summary
Both tests ran fine, with 0.28% error on proxy, which I think is acceptable. However, now I am really confused about what is actually wrong. Maybe deploy a test cluster under VPC and do further evaluation.
Can you do a test of "web service deployed in swarm + Reverse Proxy" (without ELB). The result should be the same as with it since ELB (according to the results) does not add almost any overhead.
As a side note, I don't think I'll be able to dive into this before the weekend. I hope that's OK.
@ankitarya10 find below the reports. these are the reports genareated by Jmeter
With Proxy avg is 286 MS
With Proxy+ELB avg is 306ms
With Proxy+ELB+DNS avg is 307ms
Closing due to inactivity. Feel free to reopen if the problem persists.
Hello,
I am also facing this same issue.I have 2 Manager nodes attached to the ELB both running swarm-listener and proxy services. But the response time is way too high, just removing one manager from the ELB brings up the page within seconds rather than the page timing out.
Any help would be appreciated.
Thanks !
If removing a manager speeds it up, it seems that the issue is not related to DFP. As an additional test, you can open a port directly on your service and then compare response times with and without DFP. Also, it would be useful to collect metrics both from ELB and from DFP and compare them.
Do you use scripts (e.g. Terraform, CloudFormation) to set up your cluster? If you do, I could reproduce it on my account and try to figure out what's wrong.
I dont have a cloud formation template but its a very simple configuration, 2 managers 2 workers. 2 managers are behind the ELB, running docker flow proxy.
Seems like an issue with the classic ELB to me as well, the load balancer woks fine with just manager node as soon as I add the 2nd manager and it comes inService the application just stops responding.
I am running a few tests to get the response times and as soon as I am done I will post it here.
Thank you for your prompt reply.
I think i finally figured it out, when both the managers are in the same AZ it works like a charm and latency is very less but as soon as I add a manager in another AZ behind the ELB (Manager1 - AZ1 and Manager2-AZ2) it stops working. Maybe you can help me out here, is it that DFP is not able to communicate to the containers in the 2nd AZ because of some ports not being open? Or could there be some other reason?
Thank you for all your help.! Appreciate it!!
In that case, the problem is almost certainly not related to DFP but in Docker networking between AZs. DFP or, to be more precise, HAProxy, only forwards requests to one of the services. That forwarding is done through Docker networking which handles everything else (LB, service discovery, and so on).
How did you create your cluster? Do you have Terraform or CloudFormation configs that I could use to reproduce it in my account?
No actually I dont have a CloudFormation Template yet, still running tests to make sure everything works. Should have the template created by the end of this week. I will definitely post it here for you to check it out.
Great. That way I can replicate the same and try to pinpoint the cause of the problem. In the meantime, can you repeat the tests with some public service and send me the commands you executed (both to create the services as to test). That should be quick and I can rerun the same on my cluster. If results are different, we'll know for certain that there's something wrong with the way you setup your cluster.
I followed these steps: docker network create --driver overlay proxy docker network create --driver overlay myapp
created proxy service replicas=5 on all node (2 managers and 3 workers): docker service create --name proxy \ -p 80:80 \ -p 443:443 \ --network proxy \ --replicas=5 \ -e MODE=swarm \ -e LISTENER_ADDRESS=swarm-listener \ vfarcic/docker-flow-proxy (Not sure if 5 replicas are required)
created one swarm-listener: docker service create --name swarm-listener \ --network proxy \ --mount "type=bind,source=/var/run/docker.sock,target=/var/run/docker.sock" \ -e DF_NOTIFY_CREATE_SERVICE_URL=http://proxy:8080/v1/docker-flow-proxy/reconfigure \ -e DF_NOTIFY_REMOVE_SERVICE_URL=http://proxy:8080/v1/docker-flow-proxy/remove \ --constraint 'node.role==manager' \ vfarcic/docker-flow-swarm-listener
started my service up using: docker service create --name myapp \ -e DB=go-demo-db \ --network myapp \ --network proxy \ --label com.df.notify=true \ --label com.df.distribute=true \ --label com.df.servicePath=/demo \ --label com.df.port=8081 \ solarwinds/whd-embedded:latest
thats all I am doing.
While doing this try having one of the workers or managers in a different AZ, the app times out and webpage doesn't load.
Can you please let me know what is the path your service should be accessible from? I set it up on my cluster but I'm not sure how to open it. I guess it's not /demo
. You can see it from http://dockeredg-external-1uzwfo1thaojq-1958892365.us-east-1.elb.amazonaws.com/demo .
Ohh sorry it should be --label com.df.servicePath=/ without the demo. Sorry!!
Can you confirm that the following log is "normal" or something failed?
-------------------------------------------
Running Entrypoint : true
-------------------------------------------
2017-06-14 19:46:18,334 CRIT Supervisor running as root (no user in config file)
2017-06-14 19:46:18,343 INFO RPC interface 'supervisor' initialized
2017-06-14 19:46:18,343 CRIT Server 'unix_http_server' running without any HTTP authentication checking
2017-06-14 19:46:18,343 INFO supervisord started with pid 5
2017-06-14 19:46:19,345 INFO spawned: 'whd' with pid 8
2017-06-14 19:46:19,351 INFO success: whd entered RUNNING state, process has stayed up for > than 0 seconds (startsecs)
2017-06-14 19:46:40,039 INFO exited: whd (exit status 0; expected)
This is correct! Nothing failed.
I appreciate you doing this, thank you so much!
I created the same service in my cluster (created with "Docker for AWS"). It consists of three nodes, each in a different AZ. I could not find any problem. The page is loading decently fast.
I'll leave it up and running on http://dockeredg-external-1uzwfo1thaojq-1958892365.us-east-1.elb.amazonaws.com/ .
I'm not sure what the problem is but it doesn't seem to be related with DFP. My best guess is that there's something wrong with your cluster setup. Maybe you can try setting it up with "Docker for AWS" and check whether the problem persists. If it does, you probably have some restriction on your AWS account. If it doesn't, you'll know that it is related with your current setup.
Thank you so much for going through and testing this out. I am trying it with "Docker for AWS" now to see if it works, it could be my security groups or NACL but i tried tweeking those to basically allow all in and outbound traffic but this still wouldnt work. If "Docker for AWS" gives an issue i will contact AWS to check if there are any restrictions on my account.
last question do i need to have proxy running all nodes and how many replicas of swarm listeners should i have if i have 3 managers?
There's no need to run DFP on all nodes. Docker's Ingress network will forward requests from any node to DFP. Normally, I run two or three instances of DFP only for high-availability. One should be enough but, in case it fails, you want to have one more until Swarm brings it back up again.
As for DFSL, run only one replica. There's no reason to have more. Actually, having more than one DFSL would only do you harm since you'd have duplicated requests to the proxy.
I'll close this issue since it does not seem to be related to DFP. Feel free to reopen it if you disagree or if you come with new info.
I have swarm cluster of 6 nodes(AWS instances) deployed under VPC. While testing the webserver I noticed highly varying response times. My setup is as follows: Route53 -> ELB -> Docker-Flow-Proxy-> Cherrypy Webserver. To test the source of the problem, I created four tests:
Test 1: web service deployed via compose + ELB, Avg Response time: 138.22ms https://a.blazemeter.com/app/?public-token=64RNazaF4zbWDb8BfGTPRgxp013v3bTpZaenOsp6yvsNu0uZEv#/accounts/130040/workspaces/123459/projects/167833/masters/16018548/summary
Test 2: web service deployed in swarm + ELB Avg Response time: 132.23ms https://a.blazemeter.com/app/?public-token=SdB7jgxXEdq3kuaAzqJ8NSpqiSKdlZmjKUkhWV17o3FzRikRmX#/accounts/130040/workspaces/123459/projects/167833/masters/16018597/summary
Test 3: web service deployed in swarm + Reverse Proxy + ELB Avg Response time: 6.11s https://a.blazemeter.com/app/?public-token=eQbj7VZRx20SOFZCXIYCGlLBLPqm6MwyM2zPrmJ4ssuuC1krA5#/accounts/130040/workspaces/123459/projects/167833/masters/16018611/summary
Test 4: web service deployed in swarm + Reverse Proxy + ELB + Route53. Avg Response time: 8.01s ( 13.03%) error https://a.blazemeter.com/app/?public-token=XfwzlVrq8wUee503Aeh1pkWYl1k6ZDeCz6k0e65Q6NpTA0Qgb7#/accounts/130040/workspaces/123459/projects/167833/masters/16018604/errorsreport
My conclusion from the above tests is that problem can be at two places: a) Connection between ELB and docker-flow-proxy b) routing from docker-flow to webserver.