Closed patran closed 7 years ago
First a bit of background...
Docker Swarm creates an entry in its service discovery as soon as a service is created even if that means that it will be operational minutes later (until image is pulled and the application inside a container is started). There is no easy way to find out from Docker API the status of that service. Even if there is, it would fluctuate a lot. So, Swarm Listener picks up information about the new service and sends the reconfigure request to the proxy. Since the service might not be running at that time, it retries until it receives OK from the proxy or it exceeds the maximum number of retries.
I think you're looking for environment variables DF_RETRY
and DF_RETRY_INTERVAL
of the Swarm Listener config. They control how many times it will retry the reconfigure request and what will be the interval between retries.
Please try it out and let me know if that's what you're looking for.
The latest DFP release changed the way validations are done. Please pull the latest release and try it out. It should remove this problem without the need to increase retry attempts and/or interval.
I'll close the issue. Feel free to reopen it if the problem persists.
@vfarcic, let's reopen this one. It does not work with gitlab, which could take 5+ minutes before being ready to provide service. Well, 5 minutes on my system :)
https://docs.gitlab.com/omnibus/docker/README.html
docker compose file for gitlab below
version: '3.1'
networks:
qwerty_prod_reverse_proxy:
external: true
dvorak_prod_gitlab:
external: true
volumes:
gitlab_gitlab_etc_gitlab:
gitlab_gitlab_log_gitlab:
gitlab_gitlab_opt_gitlab:
services:
gitlab_gitlab:
image: "gitlab/gitlab-ee:9.0.5-ee.0"
ports:
- '22'
- '80'
- '443'
networks:
qwerty_prod_reverse_proxy:
dvorak_prod_gitlab:
volumes:
- gitlab_gitlab_etc_gitlab:/etc/gitlab
- gitlab_gitlab_log_gitlab:/var/log/gitlab'
- gitlab_gitlab_opt_gitlab:/var/opt/gitlab'
deploy:
labels:
com.df.notify: "true"
com.df.distribute: "true"
com.df.serviceDomain: "gitlab.abc.def.com"
com.df.servicePath: "/"
com.df.port: 80
com.df.setHeader: "X-Forwarded-Port %[dst_port]"
com.df.addHeader: "X-Forwarded-Ssl on if { ssl_fc }, X-Forwarded-Proto https if { ssl_fc }, X-Forwarded-Protocol https if { ssl_fc }, X-Url-Scheme https if { ssl_fc }"
resources:
limits:
cpus: "0.000"
memory: "16g"
reservations:
cpus: "0.000"
memory: "16g"
mode: "replicated"
replicas: 1
update_config:
parallelism: 1
delay: "60s"
placement:
constraints:
- node.labels.dvorak_prod_gitlab_gitlab == yes
environment:
GITLAB_OMNIBUS_CONFIG: |
external_url 'https://gitlab.abc.def.com'
# Add any other gitlab.rb configuration here, each on its own line
#nginx['redirect_http_to_https'] = false
#nginx['ssl_certificate'] = "/etc/gitlab/ssl/notinuse.com.crt"
#nginx['ssl_certificate_key'] = "/etc/gitlab/ssl/notinuse.com.key"
nginx['listen_port'] = 80
nginx['listen_https'] = false
nginx['proxy_set_headers'] = { "X-Forwarded-Proto" => "https", "X-Forwarded-Ssl" => "on" }
gitlab_rails['lfs_enabled'] = true
gitlab_rails['gitlab_email_from'] = "gitlab@gitlab.abc.def.com"
DOCKER_SERVICE_NAME: "{{.Service.Name}}"
DOCKER_SERVICE_ID: "{{.Service.ID}}"
DOCKER_SERVICE_LABELS: "{{.Service.Labels}}"
DOCKER_NODE_ID: "{{.Node.ID}}"
DOCKER_TASK_ID: "{{.Task.ID}}"
DOCKER_TASK_NAME: "{{.Task.Name}}"
DOCKER_TASK_SLOT: "{{.Task.Slot}}"
Sorry for not responding earlier. DockerCon finished and I'm about to go back home. I'll take a look at this issue on Monday. I hope that's not too late.
@patran Can you confirm that DF_RETRY
and DF_RETRY_INTERVAL
are longer (when multiplied) then the time it takes to pull and initialize GitLab?
@vfarcic, confirmed.
DF_RETRY_INTERVAL - tested with 5s and 7s and worked as expected DF_RETRY - set to 400, did not count, but certainly, the desired end effect -- support apps that would take a 5+ minutes to be ready worked -= was achieved.
Tested with gitlab getting pulled and initialized.
Btw, I had interpreted one of your comments as apps such as gitlab would get detected by proxy and work properly even without having to specify DF_RETRY/DF_RETRY_INTERVAL. Just to confirm: without the retry settings, I could not get the proxy to detect gitlab reliably and reconfigure haproxy properly
The proxy has defaults that work correctly in most (not all cases). Normally, it should not take more than a couple of seconds to pull an image and create containers. By default, Swarm Listener will repeat a request fifty times with five seconds pause between each. That makes it a little over four minutes. Such a default is more than enough in most cases.
The reason why there is a maximum number of retries lies in a possible never ending loop. One might create a service that never initializes. In such a case, without a max. number of retries, Swarm Listener would loop continuously.
It's not that the proxy could not detect GitLab. The problem is that GitLab (together with your probably slow bandwidth) takes too much to pull so the endpoint (DNS) created by Docker Overlay network was delayed quite a lot. As a result, Proxy thought that the service does not exist.
I'm not sure whether I managed to explain the logic behind it. Please let me know if I didn't and I'll try to be more descriptive.
I'd be more than happy to improve the code if you have a suggestion.
The DF_RETRY and associated algorithm makes perfect sense. Along with the ability to instruct the proxy to reload, I think, situations such as, slow bandwidth, temporary network partition, longer periods of communication impairment, traffic overload etc. are well covered.
Btw, for a given instance of the proxy, could you help me understand the design behind how haproxy.cfg gets updated? I am primarily interested in if there ever could be a situation where the haproxy.cfg of one given proxy could get updated simultaneously by multiple "threads" Thanks...
haproxy.cfg gets updated on every reconfigure or remove request. Each request is handled in a separate subroutine as a way to avoid bottlenecks. However, the function that writes the file is synced so that only one write can happen at any given moment and avoid potential corruption if multiple writes happen at the same time. In other words, requests handling is done as multiple subroutines but writing the config is synchronous.
Please let me know if I explained it well. If not, I'll get back to you with a more detailed description and/or relevant parts of the code.
I think this ticket can be closed. Feel free to reopen it if you disagree.
I'm facing this issue when I deploy a "bad" service. By the time I fix it, and try to push the new build though the pipeline, the proxy listener gives me this error:
"Max retries exceeded with url"
In my case, it is the nib0r/docker-flow-proxy-letsencrypt
service that gets this error...
@drozzy Can you confirm that you're using dockerflow/docker-flow-proxy
and not the one from this project? We moved it from vfarcic
to dockerflow
a while ago.
@vfarcic this issue has gone away. In general, I found proxy to be working correctly, so ignore my earlier report.
Yes, I am using new docker flow proxy now.
I cannot fully reproduce it, but at times, if a service takes a long time (several minutes) to start up, the proxy does not always re-configure/re-generate haproxy.cfg. A docker restart proxy does produce a correct haproxy.cfg.
I have observed 2 situations:
Is there a default timeout somewhere? If there is a timeout or max retries, would it be possible to have a default but allow user override?