Closed adamgraves-choices closed 7 years ago
I am indeed noticing this behavior. I have notice I have some containers set with really long health checks, and when those are in play I think this tends to exacerbate this problem.
i have the same problem. when i upgrade a server and the ip adress changes it does not get reflected in the traefik config. is there a way to manually regenerate the rules & traefik.toml? currently i restart the traefik docker and the config is correct again but this is not suitable for production
@rawmind0 any recommendations on how to fix this in situ? I have tried restarting either rancher-traefik or alpine-traefik, or both, with curious results. One of which being banned from letsencrypt by rate limiting :(
I'd like to know if there is a better method, perhaps a command I can run inside one of the containers to force it to reload it's configurations without dropping all the certs.
Another thought, is that maybe we could have a version of this that keeps all it's configs in a convoy-nfs mount.
I know all of this might be moot as well once traefik begins to natively support rancher.
Hi guys,
sorry about the issues you have suffering. Could you please, provide some more details about??
BTW, inside alpine-traefik container, you could restart traefik or confd without the need of restart the container..
monit restart traefik
#or
monit restart confd
At the beginning everything worked fine but after some time rancher-traefik did not updated a new ip after an upgrade of a container (and the resulting ip change). it still had the old ip address for the backed. I am not sure but it could be related with a updated to rancher version 1.5.3. Currently i am testing the new nativ Traefik Rancher backend and it looks promising.
Hello
I have the same problem with Rancher 1.5.3 and Traefik rawmind/alpine-traefik:1.2.3-1
EDIT:
Maybe it's due to confd does not refresh metadata:
bash-4.3$ curl http://rancher-metadata
curl: (6) Couldn't resolve host 'rancher-metadata'
btw dns is working on other containers and metadata works.
Due to https://github.com/rancher/rancher/issues/5041
I tried to add search into rancher ui and after upgrade dns is now working but confd is always empty :(
Hi @snahelou ...
This is not the cause of the problem.... confd is able to ressolv rancher URI an connect...This problem is with alpine curl, not confd.... If you do curl http://rancher-metadata.rancher.internal it should work.....
Please, publish confd logs...../opt/tools/confd/log/confd.log inside alpine-traefik containers....
Have your services healthcheck configured??
Hello
Yes sorry, dns was not the problem.
I had the following error
2017-05-02T12:42:28Z traefik-traefik-1 /opt/tools/confd/bin/confd[159]: ERROR template: rules.toml.tmpl:41:34: executing "rules.toml.tmpl" at <getv (printf "/stack...>: error calling getv: key does not exist
{{- $back_status := getv (printf "/stacks/%s/services/%s/containers/%s/health_state" $stack_name $service_name $container) -}}
I remove 2 stacks and the service come back available. It's strange because stacks were green.
It seems you din't have healthcheks configured....health checks are mandatory...only healthy backends are added to traefik..
Ok, strange, healthchecks were configured because I used a jenkins multibranch pipeline and other branchs works well.
Thanks for your support.
Regards
Hi! I've got an intermittent issue very similar to this one where traefik isn't updating the frontend and backend configures in our Rancher environment on every host (some hosts are updated).
New stacks and changes to stacks sometimes don't get reflected in every host config.
About our configuration:
One note, the confd log of the traefik1 shows the error "executing "rules.toml.tmpl" at <getv (printf "/stack...>: error calling getv: key does not exist", but traefik1 is the one configured ok, traefik2 is the one that is not configured ok (not refreshed). I've also check every traefik label on the servers and are exactly the same as the one attached
Anyone else with the same? Thanks! Juan
healthchecks
traefik 2 dashboard where test-portal1-14-06 service is not discovered
traefik 1 dashboard where test-portal1-14-06 service is discovered
nginx labels
traefik-1-confd.txt traefik-2.txt traefik-1.txt traefik-2-confd.txt
Some more information, I've check file /opt/traefik/etc/rules.toml on traefik-1 and traefik-2 and on both of them the "test-portal1-14-06 " service configuration is present, don't know why traefik does not reload, perhups related to this?
@rawmind0 any help on this? Any suggestion? can you please check my post in this issue
Check if all of your stacks are green even if they have no traefik tags When I have errors on a stack, that make my confd unstable. In your case, It's very strange that one server work and not the other.
Regards
When a container crashes and restarts itself, Traefik correctly removes the container from the pool but doesn't readd it once it's restarted again. I have to manually scale the stack up and down to get Traefik to pick it up. Any ideas?
Considering abandoning this image and going for the native Rancher support in Traefik 1.3 to see if that resolves it.
@dbsanfte, no idea, I've try to evacuate a host and traefik updates correctly when new containers are created on other hosts. @snahelou thanks for the response! I have all stacks on green.
Some test I've done, not sure if they are the ones that makes it work now... (just in case it helps someone):
Till no more red stacks and using ubuntu 16.04, traefik seams to be working ok for, at least, 24 hours
@jjscarafia , your case is so strange....
In your confd log files, last update should set rules.toml file to same content....It's so strange to work just in one server.... Infrastructure services are working well on both?? traefik-2-confd.txt
2017-06-14T12:43:59Z traefik-traefik-2 /opt/tools/confd/bin/confd[143]: INFO /opt/traefik/etc/rules.toml has md5sum bf6b2298be0acf958ad37fac08f7180d should be 7
3983e979b367f06346659a41726824f
2017-06-14T12:43:59Z traefik-traefik-2 /opt/tools/confd/bin/confd[143]: INFO Target config /opt/traefik/etc/rules.toml out of sync
2017-06-14T12:43:59Z traefik-traefik-2 /opt/tools/confd/bin/confd[143]: INFO Target config /opt/traefik/etc/rules.toml has been update
traefik-1-confd.txt
2017-06-14T12:44:09Z traefik-traefik-1 /opt/tools/confd/bin/confd[24]: INFO /opt/traefik/etc/rules.toml has md5sum bf6b2298be0acf958ad37fac08f7180d should be 73
983e979b367f06346659a41726824f
2017-06-14T12:44:09Z traefik-traefik-1 /opt/tools/confd/bin/confd[24]: INFO Target config /opt/traefik/etc/rules.toml out of sync
2017-06-14T12:44:09Z traefik-traefik-1 /opt/tools/confd/bin/confd[24]: INFO Target config /opt/traefik/etc/rules.toml has been updated
With ubuntu and docker 1.12.6 is working well???
Hi @rawmind0 and thanks for the comments!
@rawmind0 just in case you are available and want, I can give you access to the rancher, just send me an email to jjs@adhoc.com.ar
Hi @jjscarafia ...
Best regards....
I've been playing for a while and I can see that:
"/stack...>: error calling getv: key does not exist
2017-06-22T21:05:46Z adhoc-traefik-traefik-3 /opt/tools/confd/bin/confd[23]: ERR
OR template: rules.toml.tmpl:41:34: executing "rules.toml.tmpl" at <getv (printf
"/stack...>: error calling getv: key does not exist
2017-06-22T21:06:01Z adhoc-traefik-traefik-3 /opt/tools/confd/bin/confd[23]: ERR
OR template: rules.toml.tmpl:41:34: executing "rules.toml.tmpl" at <getv (printf
"/stack...>: error calling getv: key does not exist
2017-06-22T21:06:16Z adhoc-traefik-traefik-3 /opt/tools/confd/bin/confd[23]: ERR
OR template: rules.toml.tmpl:41:34: executing "rules.toml.tmpl" at <getv (printf
"/stack...>: error calling getv: key does not exist
2017-06-22T21:06:31Z adhoc-traefik-traefik-3 /opt/tools/confd/bin/confd[23]: ERR
OR template: rules.toml.tmpl:41:34: executing "rules.toml.tmpl" at <getv (printf
"/stack...>: error calling getv: key does not exist
Moving over to the native Traefik Rancher support resolved my issue with my crashed/auto-restarted Node.js containers not being picked up by this image.
@dbsanfte good to know that and thanks for sharing. Are you also using acme support with native rancher support?
No we're just defining a plain old SSL cert/key, no ACME.
I just hit this one too. In my case, a host went down which caused some stacks to migrate to another host.
There were some other stacks that were simply stopped because I didn't want them alive at the moment. Traefik did not start updating until I started those stacks as well, which I could then stop at my leisure.
@lasley moving to native traefik support to rancher make it works ok for me. If it helps, this is my very ugly rancher-catalog template
@jjscarafia I've built something similar using the native rancher templates: https://github.com/nhsuk/traefik-rancher
Unfortunately I've come across a critical bug which stops us using Traefik for now: https://github.com/containous/traefik/issues/1927
@adamgraves-choices thanks for the feedback. It seams that was the issue I've face yesterday...
Honestly I thought I was just screwing up somehow so I wasn't even going to say anything 😆
I am having a similar issue. I was able to get past the error in the log message by setting an environmental variable CONF_PREFIX
to /latest
which seems to have triggered confd to look at the latest route in the rancher metadata service not the default of /2015-12-19
. However I am still having an issue with the correct rules being written.
When confd completes its interval I do in fact see a new /opt/traefik/etc/rules.toml
file but it is missing the URL and backends params shown in the template.
I believe it is skipping over the following block in the template because rancher-meta has not yet registered the container is healthy by the time confd finishes writing the new rules.toml.
{{- if eq $back_status "healthy" }}
[backends.{{$service_name}}__{{$stack_name}}.servers.{{getv (printf "/stacks/%s/services/%s/containers/%s/name" $stack_name $service_name $container)}}]
{{- if eq $traefik_protocol "https"}}
url = "{{$traefik_protocol}}://{{getv (printf "/stacks/%s/services/%s/containers/%s/primary_ip" $stack_name $service_name $container) -}}:
{{- else}}
url = "http://{{getv (printf "/stacks/%s/services/%s/containers/%s/primary_ip" $stack_name $service_name $container) -}}:
{{- end -}}
{{- if exists (printf "/stacks/%s/services/%s/labels/traefik.port" $stack_name $service_name) -}}
{{getv (printf "/stacks/%s/services/%s/labels/traefik.port" $stack_name $service_name)}}
{{- else -}}
80
{{- end}}"
weight = 0
{{- end -}}
{{- end -}}
It seems to be when confd is trigged to run it detects a change in the number of stacks in "latest" but it if the container is not "healthy" by the time it writes the new rules file it will skip over that part of the template.
My suspicion is since the number of stacks doesn't change by the next interval the rules.toml doesn't get updated until the number of stacks change in rancher, which could be a long time or even never.
If my suspicion is correct then is there a better methodology of updating the rules.toml other then counting the number of stacks in rancher?
I do have health checks configured on all my stacks so I am not sure how to move forward.
Once again assuming that confd is only looking for a change in number of stacks in the environment I see 3 possible solutions.
rules.toml.toml
file that some how dynamically checks the individual health of each container before executing rules.toml.tmpl
. This also seems like it could break down similar to option one if some containers in the environment are never healthy.@alexisaperez - Regarding confd - I think that it's a dumb implementation & simply rewrites the rules every X units of time.
The reasoning behind this assertion is that when I make the comma change in #51, it's just a few seconds until the rule is updated in Traefik. I'm definitely no confd expert though, so it's possible it's noticing the change in the rules file itself and triggering the update.
@lasley I thought that at first as well, but in my testing it seems that the rules.toml only gets updated when the number of stacks in the environment changes. I also am not an expert in confd it is just what I observed. I think one way that might solve the issue for my environment at least would be to change the key in the rules.toml.toml
from /stacks
to /containers
but I will have to report back on if thats feasible.
I'm also having the same problem with frontends/backend not getting updated although everything is green and healty - confd.log logs show plenty of:
2017-10-12T12:55:53Z traefik-traefik-1 /opt/tools/confd/bin/confd[24]: ERROR template: traefik.crt.tmpl:1:20: executing "traefik.crt.tmpl" at <getv "/traefik/ssl_c...>: error calling getv: key does not exist
Hi all,
From alpine-traefik release 1.4.0-3, traefik built in rancher integration is supported, metadata and api. Also, community-catalog is already updated. Now 3 rancher integration are available, metadata, api ( traefik built in) or external (rancher-traefik).
Take into account that labels are different with traefik built in integration, https://docs.traefik.io/configuration/backends/rancher/#labels-overriding-default-behaviour Metadata with longpoll is the prefered integration, it’s working so good. :)
Also, I made a PR that is merged and will be included in next traefik release with a refactor of rancher integration. https://github.com/containous/traefik/pull/2291
Best regards...
Great news, great work! Thanks for the update!
Hi all,
rancher-traefik updated to use rancher-template instead confd to get immediate updates from metadata. Traefik external integration use it.
Best regards...
Hi,
We've got an intermittent issue where traefik isn't updating the frontend and backend configures in our Rancher environment.
New stacks and changes to stacks sometimes don't get reflected in the config, sometimes it resolves itself within approx. 10-60 minutes, but on some occasions we have to restart the Traefik stack. Sometimes that doesn't help, and we have ended up destroying the environment and rebuilding it from scratch to resolve the issue.
Last time it occurred I tested the rancher-metadata service to ensure that was working, and everything looked fine from there.
Anyone else encountering this?