jittering / traefik-kop

A dynamic docker->redis->traefik discovery agent
MIT License
190 stars 14 forks source link

Troubleshooting Setup #5

Closed jarrah31 closed 2 years ago

jarrah31 commented 2 years ago

Hi Cheten,

Thank you for creating this agent!

I've attempted to install and use Traefik-kop but so far I've been unable to make Traefik detect the remote service. I suspect a configuration issue somewhere so I've copied below my setup in case you can spot an issue. Could you suggest any methods to test connectivity between the hosts please?

So far I have two physical Docker hosts - dnuca with Traefik and dnucb with a Firefox container.

dnuca has the following Traefik and redis conatainer config (I use labels for Traefik configuration) (Host IP: 192.168.1.107)

version: "3.8"

networks:
  t2_proxy:
    external:
      name: t2_proxy
  default:
    driver: bridge
  socket_proxy:
    external:
      name: socket_proxy

 traefik:
    container_name: traefik
    image: traefik
    restart: unless-stopped
    command: # CLI arguments
      - --global.checkNewVersion=true
      - --global.sendAnonymousUsage=true
      - --entryPoints.http.address=:80
      - --entryPoints.https.address=:443
        # Allow these IPs to set the X-Forwarded-* headers - Cloudflare IPs: https://www.cloudflare.com/ips/
      - --entrypoints.https.forwardedHeaders.trustedIPs=173.245.48.0/20,103.21.244.0/22,103.22.200.0/22,103.31.4.0/22,141.101.64.0/18,108.162.192.0/18,190.93.240.0/20,188.114.96.0/20,197.234.240.0/22,198.41.128.0/17,162.158.0.0/15,104.16.0.0/13,104.24.0.0/14,172.64.0.0/13,131.0.72.0/22
      - --entryPoints.traefik.address=:8080
      - --api=true
      - --api.dashboard=true
      - --log=true
      - --log.level=INFO # (Default: error) DEBUG, INFO, WARN, ERROR, FATAL, PANIC
      - --accessLog=true
      - --accessLog.filePath=/traefik.log
      - --accessLog.bufferingSize=100 # Configuring a buffer of 100 lines
      - --accessLog.filters.statusCodes=400-499
      - --providers.docker=true
      - --providers.docker.endpoint=tcp://socket-proxy:2375
      - --providers.docker.exposedByDefault=false
      - --providers.docker.network=t2_proxy
      - --providers.docker.swarmMode=false
      - --providers.file.directory=/rules # Load dynamic configuration from one or more .toml or .yml files in a directory
      - --providers.file.watch=true # Only works on top level files in the rules folder
      - --providers.providersThrottleDuration=2s
      - --providers.redis.endpoints=redis:6379
      - --entrypoints.https.http.tls.options=tls-opts@file
      - --entrypoints.https.http.tls.certresolver=dns-cloudflare
      - --entrypoints.https.http.tls.domains[0].main=$DOMAINNAME # Pulls main cert for second domain
      - --entrypoints.https.http.tls.domains[0].sans=*.$DOMAINNAME # Pulls wildcard cert for second domain
      # - --certificatesResolvers.dns-cloudflare.acme.caServer=https://acme-staging-v02.api.letsencrypt.org/directory # LetsEncrypt Staging Server - uncomment when testing
      - --certificatesResolvers.dns-cloudflare.acme.email=$CLOUDFLARE_EMAIL
      - --certificatesResolvers.dns-cloudflare.acme.storage=/acme.json
      - --certificatesResolvers.dns-cloudflare.acme.dnsChallenge.provider=cloudflare
      - --certificatesResolvers.dns-cloudflare.acme.dnsChallenge.resolvers=1.1.1.1:53,1.0.0.1:53
      - --certificatesResolvers.dns-cloudflare.acme.dnsChallenge.delayBeforeCheck=90 # To delay DNS check and reduce LE hitrate
    networks:
      t2_proxy:
        ipv4_address: 192.168.90.254 # You can specify a static IP
      socket_proxy:
    depends_on:
      - socket-proxy
      - oauth
    security_opt:
      - no-new-privileges:true
    ports:
      - target: 80
        published: 80
        protocol: tcp
        mode: host
      - target: 443
        published: 443
        protocol: tcp
        mode: host
    volumes:
      - $DOCKERDIR/traefik2/rules:/rules # file provider directory
      # - /var/run/docker.sock:/var/run/docker.sock:ro # Use Docker Socket Proxy instead for improved security
      - $DOCKERDIR/traefik2/acme/acme.json:/acme.json # cert location - you must touch this file and change permissions to 600
      - $DOCKERDIR/traefik2/traefik.log:/traefik.log # for fail2ban - make sure to touch file before starting container
      - $DOCKERDIR/shared:/shared
    environment:
      - CF_API_EMAIL_FILE=/run/secrets/cloudflare_email
      - CF_API_KEY_FILE=/run/secrets/cloudflare_api_key
    secrets:
      - cloudflare_email
      - cloudflare_api_key
    labels:
      - "com.centurylinklabs.watchtower.enable=true"
      - "traefik.enable=true"
      # HTTP-to-HTTPS Redirect
      - "traefik.http.routers.http-catchall.entrypoints=http"
      - "traefik.http.routers.http-catchall.rule=HostRegexp(`{host:.+}`)"
      - "traefik.http.routers.http-catchall.middlewares=redirect-to-https"
      - "traefik.http.middlewares.redirect-to-https.redirectscheme.scheme=https"
      # HTTP Routers
      - "traefik.http.routers.traefik-rtr.entrypoints=https"
      - "traefik.http.routers.traefik-rtr.rule=Host(`traefik.$DOMAINNAME`)"
      - "traefik.http.routers.traefik-rtr.tls=true"
      ## Services - API
      - "traefik.http.routers.traefik-rtr.service=api@internal"
      ## Middlewares
      - "traefik.http.routers.traefik-rtr.middlewares=chain-authelia@file"

  redis:
    image: redis
    container_name: redis
    restart: always
    ports:
      - "6379:6379"
    environment:
      - REDIS_REPLICATION_MODE=master
    volumes:
      - $DOCKERDIR/redis:/data
    networks:
      t2_proxy:

dnucb host (Host IP 192.168.1.32):

version: "3.8"

networks:
  t2_proxy:
    external:
      name: t2_proxy
  default:
    driver: bridge

services:
  firefox:
    image: jlesage/firefox:latest
    container_name: firefox
    restart: unless-stopped
    networks:
      - t2_proxy
    security_opt:
      - no-new-privileges:true
      - seccomp:unconfined 
    ports:
      - "9007:5800"
    volumes:
      - $DOCKERDIR/firefox:/config
      - /dev/shm:/dev/shm
      - $DOCKERDIR/shared:/shared
    environment:
      USER_ID: $PUID
      GROUP_ID: $PGID
      TZ: $TZ
      UMASK: 002
      KEEP_APP_RUNNING: 1
      CLEAN_TMP_DIR: 1
      DISPLAY_WIDTH: 1440
      DISPLAY_HEIGHT: 900
      VNC_PASSWD: $FIREFOX_VNC_PASSWD
    labels:
      - "com.centurylinklabs.watchtower.enable=true"
      - "traefik.enable=true"
      ## HTTP Routers
      - "traefik.http.routers.firefox-rtr.entrypoints=https"
      - "traefik.http.routers.firefox-rtr.rule=Host(`firefox.$DOMAINNAME`)"
      - "traefik.http.routers.firefox-rtr.tls=true"
      ## Middlewares
      - "traefik.http.routers.firefox-rtr.middlewares=chain-authelia@file"
      ## HTTP Services
      - "traefik.http.routers.firefox-rtr.service=firefox-svc"
      - "traefik.http.services.firefox-svc.loadbalancer.server.port=9007"

  traefik-kop:
    image: "ghcr.io/jittering/traefik-kop:latest"
    container_name: traefik-kop
    restart: unless-stopped
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock
    environment:
      - "REDIS_ADDR=192.168.1.107:6379"
      - "BIND_IP=192.168.1.32"

Verbose logs from Traefik-top:

today at 19:23:50time="2022-01-23T19:23:50Z" level=info msg="Starting provider aggregator.ProviderAggregator {}"
today at 19:23:50time="2022-01-23T19:23:50Z" level=info msg="Starting provider *docker.Provider {\"watch\":true,\"endpoint\":\"unix:///var/run/docker.sock\",\"swarmModeRefreshSeconds\":\"15s\"}"
today at 19:23:50time="2022-01-23T19:23:50Z" level=debug msg="Provider connection established with docker 20.10.12 (API 1.41)" providerName=docker
today at 19:23:50time="2022-01-23T19:23:50Z" level=debug msg="Filtering disabled container" container=traefik-kop-docker-c51df9724828ba18f2add6b70644cf8f5781f6da7272c0e5eb459a0bde83032b providerName=docker
today at 19:23:50time="2022-01-23T19:23:50Z" level=debug msg="Filtering disabled container" providerName=docker container=dozzle-docker-422c096b8822c83fbb62785969680a64c31a071dd1a6f701e66b83bb26182b44
today at 19:23:50time="2022-01-23T19:23:50Z" level=debug msg="Configuration received from provider docker: {\"http\":{\"routers\":{\"firefox-rtr\":{\"entryPoints\":[\"https\"],\"middlewares\":[\"chain-authelia@file\"],\"service\":\"firefox-svc\",\"rule\":\"Host(`firefox.mydomain.com`)\",\"tls\":{}}},\"services\":{\"firefox-svc\":{\"loadBalancer\":{\"servers\":[{\"url\":\"http://172.18.0.2:9007\"}],\"passHostHeader\":true}}}},\"tcp\":{},\"udp\":{}}" providerName=docker
today at 19:23:50time="2022-01-23T19:23:50Z" level=info msg="refreshing configuration"
today at 19:23:50time="2022-01-23T19:23:50Z" level=debug msg="found http service: firefox-svc@docker"
today at 19:23:50time="2022-01-23T19:23:50Z" level=debug msg="found router firefox-rtr for service firefox-svc"
today at 19:23:50time="2022-01-23T19:23:50Z" level=debug msg="found container '/firefoxdsrv2' (d8f4504b5cb862c596eb54a0004d2cc4a8174b2c303f15f38812969f61e11fa1) for service 'firefox-svc'"
today at 19:23:50time="2022-01-23T19:23:50Z" level=debug msg="using explicitly set port 9007 for firefox-svc"
today at 19:23:50time="2022-01-23T19:23:50Z" level=debug msg="writing traefik/http/routers/firefox-rtr/service = firefox-svc"
today at 19:23:50time="2022-01-23T19:23:50Z" level=debug msg="writing traefik/http/routers/firefox-rtr/rule = Host(`firefox.mydomain.com`)"
today at 19:23:50time="2022-01-23T19:23:50Z" level=debug msg="writing traefik/http/services/firefox-svc/loadBalancer/servers/0/url = http://192.168.1.32:9007"
today at 19:23:50time="2022-01-23T19:23:50Z" level=debug msg="writing traefik/http/services/firefox-svc/loadBalancer/passHostHeader = true"
today at 19:23:50time="2022-01-23T19:23:50Z" level=debug msg="writing traefik/http/routers/firefox-rtr/entryPoints/0 = https"
today at 19:23:50time="2022-01-23T19:23:50Z" level=debug msg="writing traefik/http/routers/firefox-rtr/middlewares/0 = chain-authelia@file"

Logs from Redis:

yesterday at 19:28:591:C 22 Jan 2022 19:28:59.763 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
yesterday at 19:28:591:C 22 Jan 2022 19:28:59.763 # Redis version=6.2.6, bits=64, commit=00000000, modified=0, pid=1, just started
yesterday at 19:28:591:C 22 Jan 2022 19:28:59.763 # Warning: no config file specified, using the default config. In order to specify a config file use redis-server /path/to/redis.conf
yesterday at 19:28:591:M 22 Jan 2022 19:28:59.763 * monotonic clock: POSIX clock_gettime
yesterday at 19:28:591:M 22 Jan 2022 19:28:59.764 * Running mode=standalone, port=6379.
yesterday at 19:28:591:M 22 Jan 2022 19:28:59.764 # Server initialized
yesterday at 19:28:591:M 22 Jan 2022 19:28:59.764 # WARNING overcommit_memory is set to 0! Background save may fail under low memory condition. To fix this issue add 'vm.overcommit_memory = 1' to /etc/sysctl.conf and then reboot or run the command 'sysctl vm.overcommit_memory=1' for this to take effect.
yesterday at 19:28:591:M 22 Jan 2022 19:28:59.764 * Ready to accept connections
yesterday at 20:29:001:M 22 Jan 2022 20:29:00.029 * 1 changes in 3600 seconds. Saving...
yesterday at 20:29:001:M 22 Jan 2022 20:29:00.029 * Background saving started by pid 20
yesterday at 20:29:0020:C 22 Jan 2022 20:29:00.034 * DB saved on disk
yesterday at 20:29:0020:C 22 Jan 2022 20:29:00.034 * RDB: 0 MB of memory used by copy-on-write
yesterday at 20:29:001:M 22 Jan 2022 20:29:00.130 * Background saving terminated with success
today at 18:41:031:M 23 Jan 2022 18:41:03.620 * 1 changes in 3600 seconds. Saving...
today at 18:41:031:M 23 Jan 2022 18:41:03.620 * Background saving started by pid 21
today at 18:41:0321:C 23 Jan 2022 18:41:03.630 * DB saved on disk
today at 18:41:0321:C 23 Jan 2022 18:41:03.631 * RDB: 0 MB of memory used by copy-on-write
today at 18:41:031:M 23 Jan 2022 18:41:03.722 * Background saving terminated with success
today at 18:41:031:M 23 Jan 2022 18:41:03.722 * Background saving terminated with success
today at 18:41:031:M 23 Jan 2022 18:41:03.722 * Background saving terminated with success
today at 18:41:031:M 23 Jan 2022 18:41:03.722 * Background saving terminated with success

Traefik logs gave an error saying "Key not found in store" - is this part of the problem?

yesterday at 19:46:35time="2022-01-22T19:46:35Z" level=info msg="Configuration loaded from flags."
yesterday at 19:46:35time="2022-01-22T19:46:35Z" level=info msg="Traefik version 2.5.7 built on 2022-01-20T16:16:23Z"
yesterday at 19:46:35time="2022-01-22T19:46:35Z" level=info msg="Stats collection is enabled."
yesterday at 19:46:35time="2022-01-22T19:46:35Z" level=info msg="Many thanks for contributing to Traefik's improvement by allowing us to receive anonymous information from your configuration."
yesterday at 19:46:35time="2022-01-22T19:46:35Z" level=info msg="Help us improve Traefik by leaving this feature on :)"
yesterday at 19:46:35time="2022-01-22T19:46:35Z" level=info msg="More details on: https://doc.traefik.io/traefik/contributing/data-collection/"
yesterday at 19:46:35time="2022-01-22T19:46:35Z" level=info msg="Starting provider aggregator.ProviderAggregator {}"
yesterday at 19:46:35time="2022-01-22T19:46:35Z" level=info msg="Starting provider *file.Provider {\"directory\":\"/rules\",\"watch\":true}"
yesterday at 19:46:35time="2022-01-22T19:46:35Z" level=info msg="Starting provider *traefik.Provider {}"
yesterday at 19:46:35time="2022-01-22T19:46:35Z" level=info msg="Starting provider *acme.Provider {\"email\":\"my@email.com\",\"caServer\":\"https://acme-v02.api.letsencrypt.org/directory\",\"storage\":\"/acme.json\",\"keyType\":\"RSA4096\",\"dnsChallenge\":{\"provider\":\"cloudflare\",\"delayBeforeCheck\":\"1m30s\",\"resolvers\":[\"1.1.1.1:53\",\"1.0.0.1:53\"]},\"ResolverName\":\"dns-cloudflare\",\"store\":{},\"TLSChallengeProvider\":{\"Timeout\":4000000000},\"HTTPChallengeProvider\":{}}"
yesterday at 19:46:35time="2022-01-22T19:46:35Z" level=info msg="Starting provider *docker.Provider {\"watch\":true,\"endpoint\":\"tcp://socket-proxy:2375\",\"defaultRule\":\"Host(`{{ normalize .Name }}`)\",\"network\":\"t2_proxy\",\"swarmModeRefreshSeconds\":\"15s\"}"
yesterday at 19:46:35time="2022-01-22T19:46:35Z" level=info msg="Testing certificate renew..." providerName=dns-cloudflare.acme
yesterday at 19:46:35time="2022-01-22T19:46:35Z" level=info msg="Starting provider *redis.Provider {\"rootKey\":\"traefik\",\"endpoints\":[\"redis:6379\"]}"
yesterday at 19:46:35time="2022-01-22T19:46:35Z" level=info msg="Starting provider *acme.ChallengeTLSALPN {\"Timeout\":4000000000}"
yesterday at 19:46:35time="2022-01-22T19:46:35Z" level=error msg="Cannot build the configuration: Key not found in store" providerName=redis
yesterday at 19:46:352022/01/22 19:46:35 redis.go:310: watchLoop in WatchTree err:Key not found in store

Thanks.

jarrah31 commented 2 years ago

Strange - managed to get it working somehow!

I started by reading the redis database using this command:

dnucb:~$ sudo apt install redis-tools
dnucb:~$ redis-cli -h 192.168.1.107 -p 6379 ping
PONG
dnucb:~$ redis-cli -h 192.168.1.107 -p 6379 keys '*'
(empty list or set)

I wrote some test data to check redis was working ok:

dnucb:~$ redis-cli -h 192.168.1.107 -p 6379
192.168.1.107:6379> ping
PONG
192.168.1.107:6379> set mykey somevalue
OK
192.168.1.107:6379> exit
dnucb:~$ redis-cli -h 192.168.1.107 -p 6379 keys '*'
1) "mykey"

Great I thought, let's enable debug log in Traefik-top. Restarted the container, checked keys, and to my surprise data appeared!

dnucb:~/docker$ redis-cli -h 192.168.1.107 -p 6379 keys '*'
1) "traefik/http/routers/firefox-rtr/rule"
2) "traefik/http/routers/firefox-rtr/middlewares/0"
3) "traefik_http_services@844ce3de795f"
4) "traefik/http/services/firefox-svc/loadBalancer/passHostHeader"
5) "traefik/http/routers/firefox-rtr/service"
6) "traefik/http/services/firefox-svc/loadBalancer/servers/0/url"
7) "traefik/http/routers/firefox-rtr/entryPoints/0"
8) "traefik_http_routers@844ce3de795f"
9) "mykey"

I really don't know why it didn't work yesterday. Perhaps things just needed to be restarted in the correct order. Anyhow, Traefik redirects to dnucb are working well, so thank you again for writing a very useful utility!

Hopefully this issue will help others looking to debug redis and Traefik top in the future. :)

chetan commented 2 years ago

Thanks for reporting back and I'm glad you got things working! I'll see about adding some troubleshooting info to the readme.

jarrah31 commented 2 years ago

Thanks Cheten!

I just want to mention that the "Key not found in store" error from Traefik no longer appears after a restart seeing as traefik data is now within redis.

Another change I forgot about that may actually be the fix is that I changed my redis container to the following image, and so perhaps a Traefik-top restart the next day is all it then needed.

Would it be worth adding the redis docker-compose below to the readme because I needed to do some digging to see what Redis was (I'd never heard of it before) and figure out the docker-compose details. Thanks.

  redis:
    image: docker.io/bitnami/redis:latest
    container_name: redis
    restart: always
    environment:
      # ALLOW_EMPTY_PASSWORD is recommended only for development.
      - ALLOW_EMPTY_PASSWORD=yes
      - REDIS_DISABLE_COMMANDS=FLUSHDB,FLUSHALL
    ports:
      - '6379:6379'
    volumes:
      - $DOCKERDIR/redis:/bitnami/redis/data
    networks:
      - t2_proxy
dorianim commented 2 years ago

Hi,

I don't think, this is related to the image. I'm using redis:alpine and it's working just fine :)