tiangolo / dockerswarm.rocks

Docker Swarm mode rocks! Ideas, tools and recipes. Get a production-ready, distributed, HTTPS served, cluster in minutes, not weeks.
https://dockerswarm.rocks/
1.1k stars 125 forks source link

[update] traefik 2.0 is out ! #28

Closed ebreton closed 3 years ago

ebreton commented 5 years ago

Some information for those who will need it before me 😄

Midnighter commented 5 years ago

Anyone got a working setup for 2.0 yet?

suchwerk commented 5 years ago

HA is no more available in the CE.

Midnighter commented 5 years ago

Too many acronyms :smiley:. HA means what? I suppose CE is community edition?

baskinsy commented 5 years ago

HA is no more available in the CE.

Can you elaborate this? HA is highly available and when deploy on swarm (with at least 3 manager nodes) and with consul (like this guide) you get HA. Can you point to a documentation link or article about that?

suchwerk commented 5 years ago

Just have a look at this issue.

baskinsy commented 5 years ago

Just have a look at this issue.

So they have dropped the KV store (like consul) for now and due to that we cannot have a replicated traefik accessing consul for storing certs? Did I understood correctly?

suchwerk commented 5 years ago

yes. u should still be able to use static certs. but in the age of let's encrypt this is a real drawback.

luizjr commented 4 years ago

Have one date to upgrade to traefik 2?

pascalandy commented 4 years ago

It's back since Traefik v2.1 :)

So they have dropped the KV store (like consul) for now and due to that we cannot have a replicated traefik accessing consul for storing certs? Did I understood correctly?

baskinsy commented 4 years ago

It's back since Traefik v2.1 :)

@pascalandy Was looking at the docs yesterday by chance and didn't noticed that. Any link? Maybe I missed that.

codeagencybe commented 4 years ago

When are the docs updated to newer version 2.1?

pascalandy commented 4 years ago

2.1 is the latest. So you are all good :) https://docs.traefik.io/

codeagencybe commented 4 years ago

@pascalandy I know that 2.1 is latest version and available. I'm referring to the documentation and articles on the website. They are still showing config files for v1.7 and is outdated.

Its waste of time to start on v1.7 and then go through the pain to upgrade to v2.x. I prefer to deploy immediately with 2.1 if possible.

So my question still stands: when can we get an update so the guides reflect the newer V2.x version

suchwerk commented 4 years ago

It's back since Traefik v2.1 :)

What exactly? let's encrypt support is (still) dropped in ce. Its only available in the ee.

codeagencybe commented 4 years ago

@suchwerk Let's encrypt was never dropped in any version. In the official docs, LE is still documented and supported. https://docs.traefik.io/https/acme/

suchwerk commented 4 years ago

@suchwerk Let's encrypt was never dropped in any version. In the official docs, LE is still documented and supported. https://docs.traefik.io/https/acme/

In HA context it has been dropped since 2.0.

pascalandy commented 4 years ago

I'm referring to the documentation and articles on the website. They are still showing config files for v1.7 and is outdated.

The URL you posted originally looked very close but it's not the same at all.

Here is the post (and new blog) are are looking for => https://containo.us/blog/traefik-2-0-docker-101-fc2893944b9d/

pascalandy commented 4 years ago

let's encrypt support is (still) dropped in ce. Its only available in the ee.

Factually false

In HA context it has been dropped since 2.0.

It's back in v2.1

codeagencybe commented 4 years ago

@suchwerk But 2.1 introduced back consul to handle dynamic config and certs. Pity that HA is removed in CE but consul brings back some small workaround it seems

suchwerk commented 4 years ago

@suchwerk But 2.1 introduced back consul to handle dynamic config and certs. Pity that HA is removed in CE but consul brings back some small workaround it seems

Can you point out how to get a reliable HA config with traefik 2.0 (ce)/let's encrypt and consul running?

suchwerk commented 4 years ago

let's encrypt support is (still) dropped in ce. Its only available in the ee.

Factually false

In HA context it has been dropped since 2.0.

It's back in v2.1

Can you point out how to get a reliable HA config with traefik 2.0 (ce)/let's encrypt and consul running?

pascalandy commented 4 years ago

I didn't search for this use case at this point. I would ask on the forum https://community.containo.us/c/traefik/traefik-v2/10

Cheers!

luizjr commented 4 years ago

I just found out that consul kv will be back in traefik v2.2. Can anyone create a docker-compose.yml with an example?

Version 2.2 is still RC, but I saw that 2.1 is already as latest so 2.2 may come out soon.

https://docs.traefik.io/master/providers/consul/

revant commented 4 years ago

I tried on single node with TRAEFIK_REPLICAS=1 and CONSUL_REPLICAS=0

used v2.2 which is v2.2.0-rc4 currently

https://hub.docker.com/_/traefik Tags: v2.2.0-rc4, 2.2.0-rc4, v2.2, 2.2, chevrotin

docker-compose.yml:

```yaml version: '3.3' services: consul-leader: image: consul:1.7.2 command: agent -server -client=0.0.0.0 -bootstrap -ui volumes: - consul-data-leader:/consul/data environment: - CONSUL_BIND_INTERFACE=eth0 - 'CONSUL_LOCAL_CONFIG={"leave_on_terminate": true}' networks: - default - traefik-public deploy: labels: # Migrate v1.7 to v2.2 - traefik.enable=true - traefik.http.routers.consul.rule=Host(`consul.${DOMAIN?Variable DOMAIN not set}`) - traefik.http.routers.consul.tls=true - traefik.http.routers.consul.tls.certresolver=myresolver - traefik.http.routers.consul.middlewares=auth - traefik.http.services.consul-svc.loadbalancer.server.port=8500" - traefik.http.middlewares.auth.basicauth.users=${USERNAME?Variable USERNAME not set}:${HASHED_PASSWORD?Variable HASHED_PASSWORD not set} - traefik.tags=${TRAEFIK_PUBLIC_TAG:-traefik-public} - traefik.docker.network=traefik-public # Https Redirect - traefik.http.routers.http_catchall.rule=HostRegexp(`{any:.+}`) - traefik.http.routers.http_catchall.entrypoints=web - traefik.http.routers.http_catchall.middlewares=https_redirect - traefik.http.middlewares.https_redirect.redirectscheme.scheme=https - traefik.http.middlewares.https_redirect.redirectscheme.permanent=true consul-replica: image: consul:1.7.2 command: agent -server -client=0.0.0.0 -retry-join="consul-leader" volumes: - consul-data-replica:/consul/data environment: - CONSUL_BIND_INTERFACE=eth0 - 'CONSUL_LOCAL_CONFIG={"leave_on_terminate": true}' networks: - default - traefik-public deploy: replicas: ${CONSUL_REPLICAS:-3} placement: preferences: - spread: node.id traefik: image: traefik:v2.2 ports: - 80:80 - 443:443 deploy: replicas: ${TRAEFIK_REPLICAS:-3} placement: constraints: - node.role == manager preferences: - spread: node.id labels: - traefik.enable=true - traefik.http.routers.traefik.rule=Host(`traefik.${DOMAIN?Variable DOMAIN not set}`) - traefik.http.routers.traefik.tls=true - traefik.http.routers.traefik.tls.certresolver=myresolver - traefik.http.routers.traefik.middlewares=auth - traefik.http.services.traefik-svc.loadbalancer.server.port=8080" - traefik.tags=traefik-public - traefik.http.middlewares.auth.basicauth.users=${USERNAME?Variable USERNAME not set}:${HASHED_PASSWORD?Variable HASHED_PASSWORD not set} - traefik.docker.network=traefik-public # Https Redirect - traefik.http.routers.http_catchall.rule=HostRegexp(`{any:.+}`) - traefik.http.routers.http_catchall.entrypoints=web - traefik.http.routers.http_catchall.middlewares=https_redirect - traefik.http.middlewares.https_redirect.redirectscheme.scheme=https - traefik.http.middlewares.https_redirect.redirectscheme.permanent=true volumes: - /var/run/docker.sock:/var/run/docker.sock command: - --api=true # Logging - --log.level=DEBUG - --accesslog=true # Docker - --providers.docker=true - --providers.docker.swarmMode=true - --providers.docker.exposedByDefault=false - --providers.docker.constraints=Label(`tag`,`traefik-public`) # Consul - --providers.consul=true - --providers.consul.endpoints=consul-leader:8500 # Letsencrypt - --entryPoints.web.address=:80 - --entryPoints.websecure.address=:443 - --certificatesResolvers.myresolver.acme.email=${EMAIL?Variable EMAIL not set} - --certificatesResolvers.myresolver.acme.storage="traefik/acme/account" - --certificatesResolvers.myresolver.acme.caServer=https://acme-staging-v02.api.letsencrypt.org/directory - --certificatesResolvers.myresolver.acme.httpChallenge=true - --certificatesResolvers.myresolver.acme.httpChallenge.entryPoint=web networks: - default - traefik-public depends_on: - consul-leader volumes: consul-data-leader: consul-data-replica: networks: traefik-public: external: true ```

traefik and consul-leader containers start.

Traefik container logs:

``` time="2020-03-22T08:18:38Z" level=info msg="Configuration loaded from flags." time="2020-03-22T08:18:38Z" level=info msg="Traefik version 2.2.0-rc4 built on 2020-03-19T17:31:45Z" time="2020-03-22T08:18:38Z" level=debug msg="Static configuration loaded {\"global\":{\"checkNewVersion\":true},\"serversTransport\":{\"maxIdleConnsPerHost\":200},\"entryPoints\":{\"web\":{\"address\":\":80\",\"transport\":{\"lifeCycle\":{\"graceTimeOut\":10000000000},\"respondingTimeouts\":{\"idleTimeout\":180000000000}},\"forwardedHeaders\":{},\"http\":{}},\"websecure\":{\"address\":\":443\",\"transport\":{\"lifeCycle\":{\"graceTimeOut\":10000000000},\"respondingTimeouts\":{\"idleTimeout\":180000000000}},\"forwardedHeaders\":{},\"http\":{}}},\"providers\":{\"providersThrottleDuration\":2000000000,\"docker\":{\"constraints\":\"Label(`tag`,`traefik-public`)\",\"watch\":true,\"endpoint\":\"unix:///var/run/docker.sock\",\"defaultRule\":\"Host(`{{ normalize .Name }}`)\",\"swarmMode\":true,\"swarmModeRefreshSeconds\":15000000000},\"consul\":{\"rootKey\":\"traefik\",\"endpoints\":[\"consul-leader:8500\"]}},\"api\":{\"dashboard\":true},\"log\":{\"level\":\"DEBUG\",\"format\":\"common\"},\"accessLog\":{\"format\":\"common\",\"filters\":{},\"fields\":{\"defaultMode\":\"keep\",\"headers\":{\"defaultMode\":\"drop\"}}},\"certificatesResolvers\":{\"myresolver\":{\"acme\":{\"email\":\"support@castlecraft.in\",\"caServer\":\"https://acme-staging-v02.api.letsencrypt.org/directory\",\"storage\":\"\\\"traefik/acme/account\\\"\",\"keyType\":\"RSA4096\",\"httpChallenge\":{\"entryPoint\":\"web\"}}}}}" time="2020-03-22T08:18:38Z" level=info msg="\nStats collection is disabled.\nHelp us improve Traefik by turning this feature on :)\nMore details on: https://docs.traefik.io/contributing/data-collection/\n" time="2020-03-22T08:18:38Z" level=error msg="The ACME resolver \"myresolver\" is skipped from the resolvers list because: unable to get ACME account: open \"traefik/acme/account\": no such file or directory" time="2020-03-22T08:18:38Z" level=info msg="Starting provider aggregator.ProviderAggregator {}" time="2020-03-22T08:18:38Z" level=debug msg="Start TCP Server" entryPointName=web time="2020-03-22T08:18:38Z" level=debug msg="Start TCP Server" entryPointName=websecure time="2020-03-22T08:18:38Z" level=info msg="Starting provider *traefik.Provider {}" time="2020-03-22T08:18:38Z" level=info msg="Starting provider *docker.Provider {\"constraints\":\"Label(`tag`,`traefik-public`)\",\"watch\":true,\"endpoint\":\"unix:///var/run/docker.sock\",\"defaultRule\":\"Host(`{{ normalize .Name }}`)\",\"swarmMode\":true,\"swarmModeRefreshSeconds\":15000000000}" time="2020-03-22T08:18:38Z" level=info msg="Starting provider *consul.Provider {\"rootKey\":\"traefik\",\"endpoints\":[\"consul-leader:8500\"]}" time="2020-03-22T08:18:38Z" level=debug msg="Exists: traefik/qmslkjdfmqlskdjfmqlksjazçueznbvbwzlkajzebvkwjdcqmlsfj" time="2020-03-22T08:18:38Z" level=debug msg="Configuration received from provider internal: {\"http\":{\"services\":{\"api\":{},\"dashboard\":{},\"noop\":{}}},\"tcp\":{},\"tls\":{}}" providerName=internal time="2020-03-22T08:18:38Z" level=debug msg="No default certificate, generating one" time="2020-03-22T08:18:38Z" level=debug msg="Provider connection established with docker 19.03.8 (API 1.40)" providerName=docker time="2020-03-22T08:18:38Z" level=debug msg="Container pruned by constraint expression: \"Label(`tag`,`traefik-public`)\"" container=traefik-consul-consul-leader-2w4m5uk6e37jxtta917wo2dh8 providerName=docker time="2020-03-22T08:18:38Z" level=debug msg="Configuration received from provider docker: {\"http\":{},\"tcp\":{},\"udp\":{}}" providerName=docker time="2020-03-22T08:18:38Z" level=debug msg="No default certificate, generating one" time="2020-03-22T08:18:43Z" level=debug msg="List: traefik" time="2020-03-22T08:18:43Z" level=error msg="Cannot build the configuration: Key not found in store" providerName=consul time="2020-03-22T08:18:43Z" level=debug msg="WatchTree: traefik" time="2020-03-22T08:18:43Z" level=debug msg="List: traefik" time="2020-03-22T08:18:43Z" level=error msg="KV connection error: Key not found in store, retrying in 521.905894ms" providerName=consul time="2020-03-22T08:18:43Z" level=debug msg="WatchTree: traefik" time="2020-03-22T08:18:43Z" level=debug msg="List: traefik" time="2020-03-22T08:18:43Z" level=error msg="KV connection error: Key not found in store, retrying in 430.870483ms" providerName=consul ```

Brief:

time="2020-03-22T08:18:43Z" level=error msg="KV connection error: Key not found in store, retrying in 430.870483ms" providerName=consul
codeagencybe commented 4 years ago

@revant this is working for you? Or you have any problems? I'm trying to get 2.1 working since days but only problems. If your version works with v2.2RC I might want to give that a try.

revant commented 4 years ago

this is working for you? Or you have any problems?

It is not working for me. I shared the container log in previous message.

I am trying a fresh install. I modified the existing traefik-consul docker-compose.yml provided in dockerswarm.rocks as per official v1 to v2 migration guide (https://docs.traefik.io/v2.2/migration/v1-to-v2/). I may have missed something.

codeagencybe commented 4 years ago

@revant Can you explain what is not working for you? Maybe we can help each other out on this.

Because I'm also trying for weeks to get this guide/article upgraded from v1.7 > 2.1.x versions but nothing is working for me.

I just copied your entire version to try it, but nothing loads for me because I somewhere read that v2.2 is bringing the KV store (back), and hoping this could solve some issues. But with your docker compose I can't access the consul UI, I can't access the traefik dashboard, ssl certs is giving problems. I only get 404 page not found errors. Is this also what you have?

I hope someone with more precisize experience can step in and help us a bit on the basics because man this v2.x is driving me crazy.

codeagencybe commented 4 years ago

@revant

Could it be that your docker compose that you are actually running is slightly different than the one you posted?

I just made some tweaks and I end up with this error log below. notice the specific lines with KV store connection error: Get \"http://consul-leader:8500/

where your log is more Cannot build the configuration: Key not found in store KV connection error: Key not found in store

It's just plain weird, I'm using the exact copy from your docker compose, yet I have a different error on that part. Any idea?

codeagencycloud_traefik.1.mblzdzh75yjx@portainer-master-1 | time="2020-03-22T14:57:16Z" level=info msg="\nStats collection is disabled.\nHelp us improve Traefik by turning this feature on :)\nMore details on: https://docs.traefik.io/contributing/data-collection/\n" codeagencycloud_traefik.1.mblzdzh75yjx@portainer-master-1 | time="2020-03-22T14:57:16Z" level=error msg="The ACME resolver \"myresolver\" is skipped from the resolvers list because: unable to get ACME account: open \"traefik/acme/account\": no such file or directory" codeagencycloud_traefik.1.mblzdzh75yjx@portainer-master-1 | time="2020-03-22T14:57:16Z" level=debug msg="Start TCP Server" entryPointName=web codeagencycloud_traefik.1.mblzdzh75yjx@portainer-master-1 | time="2020-03-22T14:57:16Z" level=debug msg="Start TCP Server" entryPointName=websecure codeagencycloud_traefik.1.mblzdzh75yjx@portainer-master-1 | time="2020-03-22T14:57:16Z" level=info msg="Starting provider aggregator.ProviderAggregator {}" codeagencycloud_traefik.1.mblzdzh75yjx@portainer-master-1 | time="2020-03-22T14:57:16Z" level=info msg="Starting provider *traefik.Provider {}" codeagencycloud_traefik.1.mblzdzh75yjx@portainer-master-1 | time="2020-03-22T14:57:16Z" level=debug msg="Configuration received from provider internal: {\"http\":{\"services\":{\"api\":{},\"dashboard\":{},\"noop\":{}}},\"tcp\":{},\"tls\":{}}" providerName=internal codeagencycloud_traefik.1.mblzdzh75yjx@portainer-master-1 | time="2020-03-22T14:57:16Z" level=info msg="Starting provider *docker.Provider {\"constraints\":\"Label(tag,traefik-public)\",\"watch\":true,\"endpoint\":\"unix:///var/run/docker.sock\",\"defaultRule\":\"Host({{ normalize .Name }})\",\"swarmMode\":true,\"swarmModeRefreshSeconds\":15000000000}" codeagencycloud_traefik.1.mblzdzh75yjx@portainer-master-1 | time="2020-03-22T14:57:16Z" level=info msg="Starting provider *consul.Provider {\"rootKey\":\"traefik\",\"endpoints\":[\"consul-leader:8500\"]}" codeagencycloud_traefik.1.mblzdzh75yjx@portainer-master-1 | time="2020-03-22T14:57:16Z" level=debug msg="Exists: traefik/qmslkjdfmqlskdjfmqlksjazçueznbvbwzlkajzebvkwjdcqmlsfj" codeagencycloud_traefik.1.mblzdzh75yjx@portainer-master-1 | time="2020-03-22T14:57:16Z" level=debug msg="No default certificate, generating one" codeagencycloud_traefik.1.mblzdzh75yjx@portainer-master-1 | time="2020-03-22T14:57:16Z" level=debug msg="Provider connection established with docker 19.03.7 (API 1.40)" providerName=docker codeagencycloud_traefik.1.mblzdzh75yjx@portainer-master-1 | time="2020-03-22T14:57:16Z" level=error msg="KV connection error: KV store connection error: Get \"http://consul-leader:8500/v1/kv/traefik/qmslkjdfmqlskdjfmqlksjaz%C3%A7ueznbvbwzlkajzebvkwjdcqmlsfj?consistent=&wait=3000ms\": dial tcp: lookup consul-leader on 127.0.0.11:53: no such host, retrying in 253.235418ms" providerName=consul codeagencycloud_traefik.1.mblzdzh75yjx@portainer-master-1 | time="2020-03-22T14:57:16Z" level=debug msg="Configuration received from provider docker: {\"http\":{},\"tcp\":{},\"udp\":{}}" providerName=docker codeagencycloud_traefik.1.mblzdzh75yjx@portainer-master-1 | time="2020-03-22T14:57:16Z" level=debug msg="Exists: traefik/qmslkjdfmqlskdjfmqlksjazçueznbvbwzlkajzebvkwjdcqmlsfj" codeagencycloud_traefik.1.mblzdzh75yjx@portainer-master-1 | time="2020-03-22T14:57:16Z" level=error msg="KV connection error: KV store connection error: Get \"http://consul-leader:8500/v1/kv/traefik/qmslkjdfmqlskdjfmqlksjaz%C3%A7ueznbvbwzlkajzebvkwjdcqmlsfj?consistent=&wait=3000ms\": dial tcp: lookup consul-leader on 127.0.0.11:53: no such host, retrying in 674.066438ms" providerName=consul codeagencycloud_traefik.1.mblzdzh75yjx@portainer-master-1 | time="2020-03-22T14:57:17Z" level=debug msg="No default certificate, generating one" codeagencycloud_traefik.1.mblzdzh75yjx@portainer-master-1 | time="2020-03-22T14:57:17Z" level=debug msg="Exists: traefik/qmslkjdfmqlskdjfmqlksjazçueznbvbwzlkajzebvkwjdcqmlsfj" codeagencycloud_traefik.1.mblzdzh75yjx@portainer-master-1 | time="2020-03-22T14:57:17Z" level=error msg="KV connection error: KV store connection error: Get \"http://consul-leader:8500/v1/kv/traefik/qmslkjdfmqlskdjfmqlksjaz%C3%A7ueznbvbwzlkajzebvkwjdcqmlsfj?consistent=&wait=3000ms\": dial tcp: lookup consul-leader on 127.0.0.11:53: no such host, retrying in 1.147320289s" providerName=consul codeagencycloud_traefik.1.mblzdzh75yjx@portainer-master-1 | time="2020-03-22T14:57:18Z" level=debug msg="Exists: traefik/qmslkjdfmqlskdjfmqlksjazçueznbvbwzlkajzebvkwjdcqmlsfj" codeagencycloud_traefik.1.mblzdzh75yjx@portainer-master-1 | time="2020-03-22T14:57:22Z" level=debug msg="List: traefik" codeagencycloud_traefik.1.mblzdzh75yjx@portainer-master-1 | time="2020-03-22T14:57:22Z" level=error msg="Cannot build the configuration: Key not found in store" providerName=consul codeagencycloud_traefik.1.mblzdzh75yjx@portainer-master-1 | time="2020-03-22T14:57:22Z" level=debug msg="WatchTree: traefik" codeagencycloud_traefik.1.mblzdzh75yjx@portainer-master-1 | time="2020-03-22T14:57:22Z" level=debug msg="List: traefik" codeagencycloud_traefik.1.mblzdzh75yjx@portainer-master-1 | time="2020-03-22T14:57:22Z" level=error msg="KV connection error: Key not found in store, retrying in 676.798455ms" providerName=consul codeagencycloud_traefik.1.mblzdzh75yjx@portainer-master-1 | time="2020-03-22T14:57:23Z" level=debug msg="WatchTree: traefik" codeagencycloud_traefik.1.mblzdzh75yjx@portainer-master-1 | time="2020-03-22T14:57:23Z" level=debug msg="List: traefik" codeagencycloud_traefik.1.mblzdzh75yjx@portainer-master-1 | time="2020-03-22T14:57:23Z" level=error msg="KV connection error: Key not found in store, retrying in 1.059143386s" providerName=consul codeagencycloud_traefik.1.mblzdzh75yjx@portainer-master-1 | time="2020-03-22T14:57:24Z" level=debug msg="WatchTree: traefik" codeagencycloud_traefik.1.mblzdzh75yjx@portainer-master-1 | time="2020-03-22T14:57:24Z" level=debug msg="List: traefik" codeagencycloud_traefik.1.mblzdzh75yjx@portainer-master-1 | time="2020-03-22T14:57:24Z" level=error msg="KV connection error: Key not found in store, retrying in 941.356083ms" providerName=consul codeagencycloud_traefik.1.mblzdzh75yjx@portainer-master-1 | time="2020-03-22T14:57:25Z" level=debug msg="WatchTree: traefik" codeagencycloud_traefik.1.mblzdzh75yjx@portainer-master-1 | time="2020-03-22T14:57:25Z" level=debug msg="List: traefik" codeagencycloud_traefik.1.mblzdzh75yjx@portainer-master-1 | time="2020-03-22T14:57:25Z" level=error msg="KV connection error: Key not found in store, retrying in 1.4470032s" providerName=consul

revant commented 4 years ago

Changelog for 2.2.0-rc1 says

[consul,etcd,kv,redis,zk] Add KV store providers (dynamic configuration only)

https://github.com/containous/traefik/releases/tag/v2.2.0-rc1

I don't know if that means cli options that are passed during starting container will work or not?

https://docs.traefik.io/getting-started/configuration-overview/

luizjr commented 4 years ago

In: https://github.com/containous/traefik/releases/tag/v2.2.0-rc1

I can see:

[consul,etcd,kv,redis,zk] Add KV store providers (dynamic configuration only) (#5899 by ldez)

Do we have to make a different docker-compose (dynamic configuration)?

suchwerk commented 4 years ago

Nothing changed with v2.2. HA configuration is still an enterprise feature. Read this: https://github.com/containous/traefik/issues/5426#issuecomment-533598163

revant commented 4 years ago

I removed consul and it is working.

certs are stored on manager node (/data/traefik/certs) and there is no multi manager setup possible

Search sent me here : https://adminsecurity.guru/traefik-migrating-v1-to-v2

Following traefik.yml works (NO Consul kv)

version: "3.3"

services:
  traefik:
    image: traefik:v2.2
    ports:
      - target: 80
        published: 80
        mode: host
      - target: 443
        published: 443
        mode: host
    command:
      - --api
      - --log.level=INFO
      - --accesslog=true
      - --metrics.prometheus=true
      - --providers.docker=true
      - --providers.docker.endpoint=unix:///var/run/docker.sock
      - --providers.docker.swarmMode=true
      - --providers.docker.exposedbydefault=false
      - --providers.docker.network=traefik-public
      - --entrypoints.http.address=:80
      - --entrypoints.https.address=:443
      - --certificatesResolvers.certbot=true
      - --certificatesResolvers.certbot.acme.httpChallenge=true
      - --certificatesResolvers.certbot.acme.httpChallenge.entrypoint=http
      - --certificatesResolvers.certbot.acme.email=${EMAIL?Variable EMAIL not set}
      - --certificatesResolvers.certbot.acme.storage=/certs/acme-v2.json
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock
      - /data/traefik/certs:/certs
    networks:
      - traefik-public
    deploy:
      mode: replicated
      replicas: 1
      placement:
        constraints:
          - node.role == manager
      update_config:
        parallelism: 1
        delay: 10s
      restart_policy:
        condition: on-failure
      labels:
        # v2.2
        - "traefik.docker.network=traefik-public"
        - "traefik.enable=true"
        - "traefik.http.services.traefik.loadbalancer.server.port=8080"
        # Http
        - "traefik.http.routers.traefik.rule=Host(`${DOMAIN?Variable DOMAIN not set}`)"
        - "traefik.http.routers.traefik.entrypoints=http,https"
        # Enable Let's encrypt auto certificate creation
        - "traefik.http.routers.traefik.tls.certresolver=certbot"
        # Enable authentication
        - "traefik.http.routers.traefik.middlewares=traefik-auth"
        - "traefik.http.middlewares.traefik-auth.basicauth.users=admin:${HASHED_PASSWORD?Variable HASHED_PASSWORD not set}"
        # Redirect All hosts to HTTPS
        - "traefik.http.routers.http-catchall.rule=hostregexp(`{host:.+}`)"
        - "traefik.http.routers.http-catchall.entrypoints=http"
        - "traefik.http.routers.http-catchall.middlewares=redirect-to-https@docker"
        - "traefik.http.middlewares.redirect-to-https.redirectscheme.scheme=https"
        - "traefik.http.routers.traefik.service=api@internal"
        - "traefik.http.routers.traefik.tls"

networks:
  traefik-public:
    name: traefik-public
    driver: overlay

Labels changed like this. e.g. portainer.yml

version: "3.3"

services:
  agent:
    image: portainer/agent:1.5.1
    environment:
      AGENT_CLUSTER_ADDR: tasks.agent
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock
      - /var/lib/docker/volumes:/var/lib/docker/volumes
    networks:
      - agent-network
    deploy:
      mode: global
      placement:
        constraints:
          - node.platform.os == linux

  portainer:
    image: portainer/portainer:1.23.2
    command: -H tcp://tasks.agent:9001 --tlsskipverify
    volumes:
      - portainer-data:/data
    networks:
      - agent-network
      - traefik-public
    deploy:
      placement:
        constraints:
          - node.role == manager
          - node.labels.portainer.portainer-data == true
      labels:
        - "traefik.docker.network=traefik-public"
        - "traefik.enable=true"
        - "traefik.http.services.portainer.loadbalancer.server.port=9000"
        # Http
        - "traefik.http.routers.portainer.rule=Host(`portainer.${DOMAIN?Variable DOMAIN not set}`)"
        - "traefik.http.routers.portainer.entrypoints=http,https"
        # Enable Let's encrypt auto certificat creation
        - "traefik.http.routers.portainer.tls.certresolver=certbot"
networks:
  agent-network:
    attachable: true
  traefik-public:
    external: true

volumes:
  portainer-data:
codeagencybe commented 4 years ago

@revant

Thanks for sharing! So it seems like the bottleneck is Consul... I have read several more articles on Github from Traefik about HA and Consul, and from what I understood is that Traefik is still HA and always will be. That concept/feature was never removed. The only change they removed is the feature for distributed SSL certificats because that feature in CE version was unstable and the Traefik EE is differently and already dsitributed-by-design, so the LE SSL distributed storage is just a "plus" in EE. But for CE, it's been removed completely.

Official quote from Traefik:

TL;DR: We didn't removed HA from Traefik, we dropped a super specific (and buggy) synchronisation feature around Let's Encrypt. This decision was not business-driven but led by the engineering team to keep Traefik clean.

Instead, they introduced back the K/V providers recently so technically we should be able to achieve HA with Traefik but need a different solution to handle the cert/config storage. I have also read that Redis(alike) might be a good alternative instead of Consul. So I'm going to see if I can find/create working solution with that. Perhaps it's easier than Consul. For Redis, I'm going for KeyDB as this fork of Redis is better and supports clustering too. LINK: https://keydb.dev/

Alternatively, they also propose don't use the SSL feature from Traefik but 3rd party certmanager like Le-Go https://github.com/go-acme/lego The only thing is the distributed storage again. We could also use the acme json file and move it to a shared bind mount with eg CEPH, Gluster etc... In that case, you can still deploy multiple Traefik instances and they all use the same ACME file on a shared volume. I haven't tested this out, but in theory I think this could work also fine.

codeagencybe commented 4 years ago

In: https://github.com/containous/traefik/releases/tag/v2.2.0-rc1

I can see:

[consul,etcd,kv,redis,zk] Add KV store providers (dynamic configuration only) (#5899 by ldez)

Do we have to make a different docker-compose (dynamic configuration)?

@luizjr Yes, dynamic config is required as mentioned in the docs.

amarCosmospace commented 4 years ago

any news ?

codeagencybe commented 4 years ago

@amarCosmospace I'm also eagerly waiting for an update from the Traefik guide to v2.1 or 2.2 So far everything I tried just does not work.

@tiangolo Any help from you on this please? Any chance you can update your guide or create a variant for the newer v2.1+ version? It's a real headbreaker if one is not familiar with Traefik to start with

Thanks!

oliverlj commented 4 years ago

just tried with --certificatesResolvers.letsencrypt.acme.storage="traefik/acme/account"

time="2020-04-27T11:24:27Z" level=error msg="The ACME resolver \"letsencrypt\" is skipped from the resolvers list because: unable to get ACME account: open traefik/acme/account: no such file or directory"

this key seems to doesn't support key like in traefik 1.7

oliverlj commented 4 years ago

Moving to a file with gluster is not a way to go :

"Though, when using Let's Encrypt for automatic certificate generation, the certificate negotiation cannot be consistently achieved because there is no guarantee that the initiator of the negotiation gets the subsequent calls."

Except for dns challenge, as let's encrpyt will query the dns and not your server where several traefik can runs

Midnighter commented 4 years ago

@revant do I read your tests correctly, though, that it should be fine for a Docker Swarm with one manager node and storing certs on the local filesystem?

I use Docker Swarm exclusively for quickly putting up prototypes and K8s for everything else so doing it that way is very much acceptable to me.

oliverlj commented 4 years ago

seems to be working with dns challenge I setup 3 swarm manager with gluster:

Got this error from the log :

infra_traefik.0.yotsre768r0m@fz-manager-3    | time="2020-05-12T21:16:05Z" level=info msg="Starting provider *acme.Provider {\"email\":\"x@gmail.com\",\"caServer\":\"https://acme-v02.api.letsencrypt.org/directory\",\"storage\":\"/certs/acme.json\",\"keyType\":\"RSA4096\",\"dnsChallenge\":{\"provider\":\"cloudflare\"},\"ResolverName\":\"letsencrypt\",\"store\":{},\"ChallengeStore\":{}}"

infra_traefik.0.yotsre768r0m@fz-manager-3    | time="2020-05-12T21:16:16Z" level=error msg="Unable to obtain ACME certificate for domains \"x.xyz\": unable to generate a certificate for the domains [x.xyz]: error: one or more domains had a problem:\n[x.xyz] failed to initiate challenge: acme: error: 400 :: POST :: https://acme-v02.api.letsencrypt.org/acme/chall-v3/4544459301/TT7eow :: urn:ietf:params:acme:error:malformed :: Unable to update challenge :: authorization must be pending, url: \n" routerName=api@docker rule="Host(`x.xyz`)" providerName=letsencrypt.acme
infra_traefik.0.yotsre768r0m@fz-manager-3    | time="2020-05-12T21:16:28Z" level=error msg="Unable to obtain ACME certificate for domains \"x.xyz\": unable to generate a certificate for the domains [x.xyz]: error: one or more domains had a problem:\n[x.xyz] failed to initiate challenge: acme: error: 400 :: POST :: https://acme-v02.api.letsencrypt.org/acme/chall-v3/4544463234/deNZGA :: urn:ietf:params:acme:error:malformed :: Unable to update challenge :: authorization must be pending, url: \n" providerName=letsencrypt.acme rule="Host(`x.xyz`)" routerName=api@docker

infra_traefik.0.niz2v4vm7mdb@fz-manager-1    | time="2020-05-12T21:16:12Z" level=error msg="Unable to obtain ACME certificate for domains \"x.xyz\": unable to generate a certificate for the domains [x.xyz]: error: one or more domains had a problem:\n[x.xyz] [x.xyz] acme: error presenting token: cloudflare: failed to create TXT record: error from makeRequest: HTTP status 400: content \"{\\n  \\\"result\\\": null,\\n  \\\"success\\\": false,\\n  \\\"errors\\\": [\\n    {\\n      \\\"code\\\": 81057,\\n      \\\"message\\\": \\\"The record already exists.\\\"\\n    }\\n  ],\\n  \\\"messages\\\": []\\n}\\n\"\n" providerName=letsencrypt.acme routerName=api@docker rule="Host(`x.xyz`)"
infra_traefik.0.niz2v4vm7mdb@fz-manager-1    | time="2020-05-12T21:16:24Z" level=error msg="Unable to obtain ACME certificate for domains \"x.xyz\": unable to generate a certificate for the domains [x.xyz]: error: one or more domains had a problem:\n[x.xyz] [x.xyz] acme: error presenting token: cloudflare: failed to create TXT record: error from makeRequest: HTTP status 400: content \"{\\n  \\\"result\\\": null,\\n  \\\"success\\\": false,\\n  \\\"errors\\\": [\\n    {\\n      \\\"code\\\": 81057,\\n      \\\"message\\\": \\\"The record already exists.\\\"\\n    }\\n  ],\\n  \\\"messages\\\": []\\n}\\n\"\n" rule="Host(`x.xyz`)" providerName=letsencrypt.acme routerName=api@docker

These errors are normal because 3 dns challenge are launch, 1 certificate is registered in gluster. so in my case, this is my manager 2 how success the dns challenge

codeagencybe commented 4 years ago

@oliverlj

Can you share your docker compose how you did this with GluserFS? Or can we talk on slack/ms teams/skype/google hangout about it? I'm looking for some help (or hire somebody) to get past this issue. I'm not able to get it working since months. Seems like you figured it out.

Thanks

oliverlj commented 4 years ago

Hi, please find my docker compose file :

version: '3.7'

services:
  traefik:
    image: traefik:v2.2
    ports:
      - published: 80
        target: 80
        mode: host
      - published: 443
        target: 443
        mode: host
    environment:
      - CF_API_EMAIL=${EMAIL?Variable EMAIL not set}
      - CF_API_KEY=${CF_API_KEY?Variable CF_API_KEY not set}
    deploy:
      mode: global
      placement:
        constraints:
          - node.role == manager
        preferences:
          - spread: node.id
      labels:
        - traefik.enable=true
        - traefik.http.routers.api.rule=Host(`${TRAEFIK_DOMAIN?Variable TRAEFIK_DOMAIN not set}`)
        # - traefik.frontend.auth.basic.users=${USERNAME?Variable USERNAME not set}:${HASHED_PASSWORD?Variable HASHED_PASSWORD not set}
        - traefik.http.routers.api.service=api@internal
        - traefik.http.routers.api.tls=true
        - traefik.http.routers.api.tls.certresolver=letsencrypt
        # Dummy service for Swarm port detection. The port can be any valid integer value.
        - traefik.http.services.dummy-svc.loadbalancer.server.port=8080
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock:ro
      - /mnt/ha/traefik/certs:/certs
    command: >
      --accesslog=true
      --api.dashboard=true
      --api.insecure=true
      --certificatesResolvers.letsencrypt.acme.email=${EMAIL?Variable EMAIL not set}
      --certificatesResolvers.letsencrypt.acme.storage=/certs/acme.json
      --certificatesResolvers.letsencrypt.acme.dnsChallenge=true
      --certificatesResolvers.letsencrypt.acme.dnsChallenge.provider=cloudflare
      --certificatesresolvers.letsencrypt.acme.httpChallenge=false
      --certificatesResolvers.letsencrypt.acme.tlsChallenge=false
      --entrypoints.web.address=:80
      --entryPoints.websecure.address=:443
      --log.level=INFO
      --providers.docker=true
      --providers.docker.endpoint=unix:///var/run/docker.sock
      --providers.docker.exposedByDefault=false
      --providers.docker.swarmMode=true
      --providers.docker.watch=true
    networks:
      - infra-public

networks:
  infra-public:
    external: true

Traefik is global and routed to host in order to get the client real ip. Http challenge and tls challenge are desactivated. I used the dns challenge with cloudflare provider. Consul is not needed because certs are stored in a bind volume /mnt/ha/traefik. /mnt/ha is a gluster mount between manager host.

Because traefik is in mode host, you will need a load balancer before to reach traefik instance

codeagencybe commented 4 years ago

@oliverlj

I have sent you an email. Can you check please?

yuri-karpovich commented 4 years ago

Hello everyone! Is there any hope?

oliverlj commented 4 years ago

@yuri-karpovich Hi, you can use what i have posted with gluster and a dns challenge

tiangolo commented 4 years ago

Hey everyone! Thanks for the discussion here.

I just finished updating all the guides to use Traefik v2. :tada:


About distributed Let's Encrypt, the previous technique with Consul seemed to work at first, but it frequently lead to issues and problems, including complete loss of the certificates for Consul errors, so it was actually even worse than having the single file. It was never very clear when was it happening or why. So, sadly, it was never as robust as it originally looked like.

Here's the specific comment explaining everything, by Emile himself: https://github.com/containous/traefik/issues/5426#issuecomment-539715461

About distributed Traefik and distributed configs, that's actually a different subject than distributed Let's Encrypt. And that is still supported by Traefik. But of course, we all want HTTPS, and distributed Let's Encrypt is what is still not supported based on only open source tools, although the enterprise edition supports it on top of other things.

There's the option to build a complex stack with a custom certificate resolver as they describe it, but there's no simple and bulletproof way to do it yet.


But for now, we can now enjoy the updated DockerSwarm.rocks with Traefik v2. :man_shrugging: :cake: :tada:

nixmomo commented 4 years ago

Hi, one question, why did i can use the new Version only on docker swarm manager? I would like to use it in swarm worker too, just for replication.... if a swarm cluster has only one manager its a stupid SOF

oliverlj commented 4 years ago

@nixmomo because traefik need to query swarm service only available on manager

you can expose the docker socket only on manager :

version: '3.8'

services:
  dockersocket:
    image: alpine/socat
    command: tcp-listen:2375,fork,reuseaddr unix-connect:/var/run/docker.sock
    deploy:
      mode: global
      placement:
        constraints:
          - node.role == manager
          - node.platform.os == linux
    networks:
      - docker-socket
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock:ro

networks:
  docker-socket:
    external: true

and use traefik to connect to it : --providers.docker.endpoint=tcp://dockersocket:2375

doing this, mode host will not be possible and real client ip could not be fetched

nixmomo commented 4 years ago

hi @oliverlj thanks, the real ip is not important for me because i use trace ids in my stack for debuging... if i need the ip i can get it over trackings and custom headers too :) More important is to eliminate the SOF in a 3 node cluster and that it is possible to deploy it on every node... I'm not a docker pro but can you explain why the host mode would not working? i don't see any hints that i can't use host mode if i connect over tcp to docker daemon

oliverlj commented 4 years ago

in mode host, you will need to listen the request where traefik runs. No problem in mode global (one traefik per node).

For my setup, i have configured traefik in mode host only on manager (needed for proxy protocol to work), and i have haproxy before to filter the node which have traefik.

this is my setup : image