jittering / traefik-kop

A dynamic docker->redis->traefik discovery agent
MIT License
179 stars 13 forks source link

Details from only one of four hosts are appearing in the dashboard #27

Closed instantdreams closed 11 months ago

instantdreams commented 11 months ago

I have four small form factor servers running services using docker compose, and traefik-kop looks like it will enable me to implement Traefik - and therefore continue my 'configuration as code' GitOps journey.


Environment

I have four servers with a minimal Linux installation and the docker engine and compose plugin, installed the approved way.


Traefik-Kop Configuration

Each of the 4 servers is running traefik-kop using a compose file and an environment file.

compose.yaml

``` services: traefik-kop: image: ghcr.io/jittering/traefik-kop:latest container_name: ${CONTAINER} hostname: ${HOSTNAME} environment: - BIND_IP=${BIND_IP} - REDIS_ADDR=${REDIS_ADDR} - REDIS_PASS=${REDIS_PASS} - VERBOSE=true - DEBUG=1 volumes: - /var/run/docker.sock:/var/run/docker.sock restart: unless-stopped networks: my-net: ipv4_address: ${NETWORK_IPV4_ADDRESS} networks: my-net: name: ${NETWORK_NAME} external: true ```

.env.example

``` # Host specifics CONTAINER=traefik-kop-[1, 2, 3, 4] HOSTNAME=[hostname] DNS=192.168.1.1 # Network specifics NETWORK_NAME=[network-name] NETWORK_IPV4_ADDRESS=[docker-ip-address] # traefik 21 redis 23 traefik-kop 25 # Container specifics BIND_IP=[host-ip-address] REDIS_ADDR=[redis-ip-address]:6379 REDIS_PASS=[password] ```


Redis Configuration

The edge server that will manage inbound requests is running the redis service, which has a compose file, an environment file, and a configuration file.

compose.yaml

``` services: redis: image: redis:latest container_name: ${CONTAINER} hostname: ${HOSTNAME} dns: ${DNS} command: redis-server --requirepass ${REDIS_PASSWORD} ports: - "6379:6379" volumes: - ${DIRECTORY_CONFIG}:/usr/local/etc/redis/ - ${DIRECTORY_DATA}:/data - /etc/timezone:/etc/timezone:ro - /etc/localtime:/etc/localtime:ro restart: unless-stopped networks: my-net: ipv4_address: ${NETWORK_IPV4_ADDRESS} networks: my-net: name: ${NETWORK_NAME} external: true ```

.env.example

``` # Host specifics CONTAINER=redis HOSTNAME=[hostname] DNS=192.168.1.1 # Network specifics NETWORK_NAME=[network-name] NETWORK_IPV4_ADDRESS=[docker-ip-address] # traefik 21 redis 23 traefik-kop 25 # Directory locations DIRECTORY_CONFIG=/srv/redis/config DIRECTORY_DATA=/srv/redis/data # Container specifics REDIS_PASSWORD=[password] ```

redis.conf

``` bind [host-ip-address] protected-mode no port 6379 tcp-backlog 511 timeout 0 tcp-keepalive 300 daemonize no pidfile redis_6379.pid loglevel verbose logfile "" databases 16 always-show-logo no set-proc-title yes proc-title-template "{title} {listen-addr} {server-mode}" locale-collate "" stop-writes-on-bgsave-error yes rdbcompression yes rdbchecksum yes dbfilename dump.rdb rdb-del-sync-files no dir /data ```


Traefik Configuration

The edge server that will manage inbound requests is running the traefik service, which has a compose file and an environment file. I am trying to use the Traefik CLI for all configuration settings.

compose.yaml

``` services: traefik: image: traefik:latest container_name: ${CONTAINER} hostname: ${HOSTNAME} command: - "--log=true" - "--log.level=DEBUG" - "--api=true" - "--api.dashboard=true" - "--api.insecure=true" - "--entrypoints.web.address=:80" - "--entrypoints.websecure.address=:443" - "--providers.redis=true" - "--providers.redis.rootkey=traefik" - "--providers.redis.endpoints=[redis-ip-address]:6379" - "--providers.redis.password=${REDIS_PASS}" # - "--certificatesresolvers.azuredns.acme.dnschallenge=true" # - "--certificatesresolvers.azuredns.acme.dnschallenge.provider=azuredns" # - "--certificatesresolvers.azuredns.acme.dnschallenge.delaybeforecheck=90" # - "--certificatesresolvers.azuredns.acme.caserver=https://acme-staging-v02.api.letsencrypt.org/directory" # - "--certificatesresolvers.azuredns.acme.email=deanwsmith@outlook.com" # - "--certificatesresolvers.azuredns.acme.storage=/acme/acme.json" ports: - "80:80" # http - "443:443" # https - "8888:8080" # web ui (enabled by --api=true) environment: - TRAEFIK_PASS=${TRAEFIK_PASS} - AZURE_CLIENT_ID=${AZURE_CLIENT_ID} - AZURE_TENANT_ID=${AZURE_TENANT_ID} - AZURE_CLIENT_SECRET=${AZURE_CLIENT_SECRET} - AZURE_RESOURCE_GROUP=${AZURE_RESOURCE_GROUP} - AZURE_SUBSCRIPTION_ID=${AZURE_SUBSCRIPTION_ID} volumes: - ${DIRECTORY_ETCTRAEFIK}:/etc/traefik - ${DIRECTORY_ACME}:/acme - ${DIRECTORY_LETSENCRYPT}:/letsencrypt - /etc/timezone:/etc/timezone:ro - /etc/localtime:/etc/localtime:ro restart: unless-stopped labels: # Traefik - "traefik.enable=true" - "traefik.http.routers.traefik.rule=Host(`traefik.example.com`)" - "traefik.http.routers.traefik.entrypoints=webs" - "traefik.http.routers.traefik.service=traefik" - "traefik.http.services.traefik.loadbalancer.server.port=8888" # - "traefik.http.routers.traefik.tls=true" # - "traefik.http.routers.traefik.tls.certresolver=myresolver" # - "traefik.http.services.traefik.loadbalancer.server.scheme=http" networks: my-net: ipv4_address: ${NETWORK_IPV4_ADDRESS} networks: my-net: name: ${NETWORK_NAME} external: true ```

.env.example

``` # Host specifics CONTAINER=traefik HOSTNAME=[hostname] DNS=192.168.1.1 # Network specifics NETWORK_NAME=[network-name] NETWORK_IPV4_ADDRESS=[docker-ip-address] # traefik 21 redis 23 traefik-kop 25 # Directory locations DIRECTORY_ETCTRAEFIK=/srv/traefik/etc-traefik DIRECTORY_ACME=/srv/traefik/acme DIRECTORY_LETSENCRYPT=/srv/traefik/letsencrypt # Container specifics TRAEFIK_PASS=[password] REDIS_PASS=[password] AZURE_CLIENT_ID=[client_id] AZURE_TENANT_ID=[tenant_id] AZURE_CLIENT_SECRET=[client_secret] AZURE_RESOURCE_GROUP=[resource_group] AZURE_SUBSCRIPTION_ID=[subscription_id] ```


Service Configurations

As well as individual services on each of the servers I also have a number of common services (diun, netdata, promtail, scrutiny-collector). Here are the configuration details for each of the promtail services.

promtail-1 compose.yaml

``` services: promtail: image: grafana/promtail:latest container_name: ${CONTAINER} hostname: ${CONTAINER}.${HOSTNAME} dns: ${DNS} command: -config.file=/etc/promtail/config.yaml -config.expand-env=true user: 0:0 ports: - "9080:9080" # web ui environment: - TZ=${TZ} volumes: - ${DIRECTORY_ETCPROMTAIL}:/etc/promtail/ - /var/log:/var/log:ro - /var/lib/docker/containers:/var/lib/docker/containers:ro - /var/lib/docker/:/var/lib/docker:ro - /var/run/docker.sock:/var/run/docker.sock:ro - /etc/timezone:/etc/timezone:ro - /etc/localtime:/etc/localtime:ro restart: unless-stopped labels: # Traefik - "traefik.enable=true" - "traefik.http.routers.promtail-1.rule=Host(`promtail-1.example.com`)" # - "traefik.http.routers.promtail-1.tls=true" # - "traefik.http.routers.promtail-1.tls.certresolver=myresolver" - "traefik.http.services.promtail-1.loadbalancer.server.scheme=http" - "traefik.http.services.promtail-1.loadbalancer.server.port=9080" networks: my-net: ipv4_address: ${NETWORK_IPV4_ADDRESS} networks: my-net: name: ${NETWORK_NAME} external: true ```

promtail-1 .env.example

``` # Host specifics CONTAINER=promtail-1 HOSTNAME=[server1] DNS=192.168.1.1 # Network specifics NETWORK_NAME=[server1] NETWORK_IPV4_ADDRESS=[docker-ip-address] # Timezone from https://en.wikipedia.org/wiki/List_of_tz_database_time_zones TZ=America/Denver # Directory locations DIRECTORY_ETCPROMTAIL=/srv/promtail/etc-promtail # Container specifics #none ```

promtail-2 compose.yaml

``` services: promtail: image: grafana/promtail:latest container_name: ${CONTAINER} hostname: ${CONTAINER}.${HOSTNAME} dns: ${DNS} command: -config.file=/etc/promtail/config.yaml -config.expand-env=true user: 0:0 ports: - "9080:9080" # web ui environment: - TZ=${TZ} volumes: - ${DIRECTORY_ETCPROMTAIL}:/etc/promtail/ - /var/log:/var/log:ro - /var/lib/docker/containers:/var/lib/docker/containers:ro - /var/lib/docker/:/var/lib/docker:ro - /var/run/docker.sock:/var/run/docker.sock:ro - /etc/timezone:/etc/timezone:ro - /etc/localtime:/etc/localtime:ro restart: unless-stopped labels: # Traefik - "traefik.enable=true" - "traefik.http.routers.promtail-2.rule=Host(`promtail-2.example.com`)" # - "traefik.http.routers.promtail-2.tls=true" # - "traefik.http.routers.promtail-2.tls.certresolver=myresolver" - "traefik.http.services.promtail-2.loadbalancer.server.scheme=http" - "traefik.http.services.promtail-2.loadbalancer.server.port=9080" networks: my-net: ipv4_address: ${NETWORK_IPV4_ADDRESS} networks: my-net: name: ${NETWORK_NAME} external: true ```

promtail-2 .env.example

``` services: promtail: image: grafana/promtail:latest container_name: ${CONTAINER} hostname: ${CONTAINER}.${HOSTNAME} dns: ${DNS} command: -config.file=/etc/promtail/config.yaml -config.expand-env=true user: 0:0 ports: - "9080:9080" # web ui environment: - TZ=${TZ} volumes: - ${DIRECTORY_ETCPROMTAIL}:/etc/promtail/ - /var/log:/var/log:ro - /var/lib/docker/containers:/var/lib/docker/containers:ro - /var/lib/docker/:/var/lib/docker:ro - /var/run/docker.sock:/var/run/docker.sock:ro - /etc/timezone:/etc/timezone:ro - /etc/localtime:/etc/localtime:ro restart: unless-stopped labels: # Traefik - "traefik.enable=true" - "traefik.http.routers.promtail-2.rule=Host(`promtail-2.example.com`)" - "traefik.http.routers.promtail-2.tls=true" # - "traefik.http.routers.promtail-2.tls.certresolver=myresolver" - "traefik.http.services.promtail-2.loadbalancer.server.scheme=http" - "traefik.http.services.promtail-2.loadbalancer.server.port=9080" networks: my-net: ipv4_address: ${NETWORK_IPV4_ADDRESS} networks: my-net: name: ${NETWORK_NAME} external: true ```

promtail-2 .env.example

``` # Host specifics CONTAINER=promtail-2 HOSTNAME=[server2] DNS=192.168.1.1 # Network specifics NETWORK_NAME=[server2] NETWORK_IPV4_ADDRESS=[docker-ip-address] # Timezone from https://en.wikipedia.org/wiki/List_of_tz_database_time_zones TZ=America/Denver # Directory locations DIRECTORY_ETCPROMTAIL=/srv/promtail/etc-promtail # Container specifics #none ```

promtail-3 compose.yaml

``` services: promtail: image: grafana/promtail:latest container_name: ${CONTAINER} hostname: ${CONTAINER}.${HOSTNAME} dns: ${DNS} command: -config.file=/etc/promtail/config.yaml -config.expand-env=true user: 0:0 ports: - "9080:9080" # web ui environment: - TZ=${TZ} volumes: - ${DIRECTORY_ETCPROMTAIL}:/etc/promtail/ - /var/lib/docker/containers:/var/lib/docker/containers:ro - /var/lib/docker/:/var/lib/docker:ro - /var/run/docker.sock:/var/run/docker.sock:ro - /var/log:/var/log:ro - /etc/timezone:/etc/timezone:ro - /etc/localtime:/etc/localtime:ro restart: unless-stopped labels: # Traefik - "traefik.enable=true" - "traefik.http.routers.promtail-3.rule=Host(`promtail-3.example.com`)" # - "traefik.http.routers.promtail-3.tls=true" # - "traefik.http.routers.promtail-3.tls.certresolver=myresolver" - "traefik.http.services.promtail-3.loadbalancer.server.scheme=http" - "traefik.http.services.promtail-3.loadbalancer.server.port=9080" networks: my-net: ipv4_address: ${NETWORK_IPV4_ADDRESS} networks: my-net: name: ${NETWORK_NAME} external: true ```

promtail-3 .env.example

``` # Host specifics CONTAINER=promtail-3 HOSTNAME=[server3] DNS=192.168.1.1 # Network specifics NETWORK_NAME=[server3] NETWORK_IPV4_ADDRESS=[docker-ip-address] # Timezone from https://en.wikipedia.org/wiki/List_of_tz_database_time_zones TZ=America/Denver # Directory locations DIRECTORY_ETCPROMTAIL=/srv/promtail/etc-promtail # Container specifics #none ```

promtail-4 compose.yaml

``` services: promtail: image: grafana/promtail:latest container_name: ${CONTAINER} hostname: ${CONTAINER}.${HOSTNAME} dns: ${DNS} command: -config.file=/etc/promtail/config.yaml -config.expand-env=true user: 0:0 ports: - "9080:9080" # web ui environment: - TZ=${TZ} volumes: - ${DIRECTORY_ETCPROMTAIL}:/etc/promtail/ - /var/lib/docker/containers:/var/lib/docker/containers:ro - /var/lib/docker/:/var/lib/docker:ro - /var/run/docker.sock:/var/run/docker.sock:ro - /var/log:/var/log:ro - /etc/timezone:/etc/timezone:ro - /etc/localtime:/etc/localtime:ro restart: unless-stopped labels: # Traefik - "traefik.enable=true" - "traefik.http.routers.promtail-4.rule=Host(`promtail-4.example.com`)" # - "traefik.http.routers.promtail-4.tls=true" # - "traefik.http.routers.promtail-4.tls.certresolver=myresolver" - "traefik.http.services.promtail-4.loadbalancer.server.scheme=http" - "traefik.http.services.promtail-4.loadbalancer.server.port=9080" networks: my-net: ipv4_address: ${NETWORK_IPV4_ADDRESS} networks: my-net: name: ${NETWORK_NAME} external: true ```

promtail-4 .env.example

``` # Host specifics CONTAINER=promtail-4 HOSTNAME=[server4] DNS=192.168.1.1 # Network specifics NETWORK_NAME=[server4] NETWORK_IPV4_ADDRESS=[docker-ip-address] # Timezone from https://en.wikipedia.org/wiki/List_of_tz_database_time_zones TZ=America/Denver # Directory locations DIRECTORY_ETCPROMTAIL=/srv/promtail/etc-promtail # Container specifics #none ```

Being able to hide sections in markdown is so useful.

Here's an example of an individual service from one of the hosts that isn't picked up:

frigate compose.yaml

``` services: frigate: image: ghcr.io/blakeblackshear/frigate:stable container_name: ${CONTAINER} hostname: ${CONTAINER}.${HOSTNAME} dns: ${DNS} privileged: true cap_add: - CAP_PERFMON shm_size: 512M devices: - /dev/bus/usb:/dev/bus/usb # USB Coral - /dev/dri:/dev/dri # intel hwaccel ports: - 5000:5000 # web ui - 8554:8554 # rtsp - 8555:8555/tcp # webrtc - 8555:8555/udp # webrtc environment: - TZ=${TZ} - FRIGATE_WYZE_PASSWORD=${FRIGATE_WYZE_PASSWORD} volumes: - ${DIRECTORY_CONFIG}:/config - ${DIRECTORY_DATABASE}:/db - ${DIRECTORY_MEDIA}:/media/frigate - /etc/localtime:/etc/localtime:ro - /dev/bus/usb:/dev/bus/usb # USB Coral - /dev/dri:/dev/dri # Intel hwaccel - type: tmpfs target: /tmp/cache tmpfs: size: 1073741824 restart: unless-stopped labels: # Traefik - "traefik.enable=true" - "traefik.http.routers.frigate.rule=Host(`frigate.example.com`)" # - "traefik.http.routers.frigate.tls=true" # - "traefik.http.routers.frigate.tls.certresolver=myresolver" - "traefik.http.services.frigate.loadbalancer.server.scheme=http" - "traefik.http.services.frigate.loadbalancer.server.port=5000" networks: my-net: ipv4_address: ${NETWORK_IPV4_ADDRESS} networks: my-net: name: ${NETWORK_NAME} external: true ```

frigate .env.example

``` # Host specifics CONTAINER=frigate HOSTNAME=[server4] DNS=192.168.1.1 # Network specifics NETWORK_NAME=[server4] NETWORK_IPV4_ADDRESS=[docker-ip-address] # Timezone from https://en.wikipedia.org/wiki/List_of_tz_database_time_zones TZ=America/Denver # Directory locations DIRECTORY_CONFIG=/srv/frigate/config DIRECTORY_DATABASE=/srv/frigate/database DIRECTORY_MEDIA=/mnt/security1/frigate # Container specifics FRIGATE_WYZE_PASSWORD=[password] ```

Here's an example of an individual service from the host that is being seen by Traefik:

teslamate compose.yaml

``` services: teslamate: image: teslamate/teslamate:latest container_name: ${CONTAINER}-app hostname: ${CONTAINER}-app.${HOSTNAME} dns: ${DNS} cap_drop: - all ports: - 4000:4000 environment: - ENCRYPTION_KEY=${ENCRYPTION_KEY} - DATABASE_USER=${DATABASE_USER} - DATABASE_PASS=${DATABASE_PASS} - DATABASE_NAME=${DATABASE_NAME} - DATABASE_HOST=${DATABASE_HOST} - DATABASE_PORT=${DATABASE_PORT} - VIRTUAL_HOST=${VIRTUAL_HOST} - CHECK_ORIGIN=${CHECK_ORIGIN} - MQTT_HOST=${MQTT_HOST} - MQTT_PORT=${MQTT_PORT} - TZ=${TZ} restart: unless-stopped labels: - "traefik.enable=true" - "traefik.http.routers.teslamate.rule=Host(`teslamate.example.org`)" # - "traefik.http.routers.teslamate.tls=true" # - "traefik.http.routers.teslamate.tls.certresolver=myresolver" - "traefik.http.services.teslamate.loadbalancer.server.scheme=http" - "traefik.http.services.teslamate.loadbalancer.server.port=4000" networks: my-net: ipv4_address: ${NETWORK_IPV4_ADDRESS_APP} database: image: postgres:13 container_name: ${CONTAINER}-db hostname: ${CONTAINER}-db.${HOSTNAME} dns: ${DNS} ports: - 5432:5432 environment: - POSTGRES_USER=${POSTGRES_USER} - POSTGRES_PASSWORD=${POSTGRES_PASS} - POSTGRES_DB=${POSTGRES_DB} - TZ=${TZ} volumes: - teslamate-db:/var/lib/postgresql/data restart: unless-stopped networks: my-net: ipv4_address: ${NETWORK_IPV4_ADDRESS_DB} grafana: image: teslamate/grafana:latest container_name: ${CONTAINER}-dash hostname: ${CONTAINER}-dash.${HOSTNAME} dns: ${DNS} ports: - 3021:3021 environment: - DATABASE_USER=${DATABASE_USER} - DATABASE_PASS=${DATABASE_PASS} - DATABASE_NAME=${DATABASE_NAME} - DATABASE_HOST=${DATABASE_HOST} - DATABASE_PORT=${DATABASE_PORT} - GF_SERVER_HTTP_PORT=${SERVER_HTTP_PORT} - GF_SERVER_DOMAIN=${SERVER_DOMAIN} - TZ=${TZ} volumes: - teslamate-grafana-data:/var/lib/grafana restart: unless-stopped labels: - "traefik.enable=true" - "traefik.http.routers.obsidian.rule=Host(`obsidian.example.com`)" # - "traefik.http.routers.obsidian.tls=true" # - "traefik.http.routers.obsidian.tls.certresolver=myresolver" - "traefik.http.services.obsidian.loadbalancer.server.scheme=http" - "traefik.http.services.obsidian.loadbalancer.server.port=3021" networks: my-net: ipv4_address: ${NETWORK_IPV4_ADDRESS_DASH} networks: my-net: name: ${NETWORK_NAME} external: true volumes: teslamate-db: teslamate-grafana-data: ```

teslamate .env.example

``` # Host specifics CONTAINER=teslamate HOSTNAME=[server3] DNS=192.168.1.1 # Network specifics NETWORK_NAME=[server3] NETWORK_IPV4_ADDRESS_APP=[docker-ip-address-app] NETWORK_IPV4_ADDRESS_DB=[docker-ip-address-db] NETWORK_IPV4_ADDRESS_DASH=[docker-ip-address-dash] # Timezone from https://en.wikipedia.org/wiki/List_of_tz_database_time_zones TZ=America/Denver # Container specifics # teslamate ENCRYPTION_KEY=[encryption_key] DATABASE_USER=teslamate DATABASE_PASS=[database_password] DATABASE_NAME=teslamate DATABASE_HOST=[server3] DATABASE_PORT=5432 VIRTUAL_HOST=teslamate.example.com CHECK_ORIGIN=true PORT=4000 MQTT_HOST=[mqtt-ip-address] MQTT_PORT=1883 # postgres POSTGRES_USER=teslamate POSTGRES_PASS=[postgres_password] POSTGRES_DB=teslamate # grafana SERVER_HTTP_PORT=3021 SERVER_DOMAIN=obsidian.example.com ```


Order of Operations


Traefik Entries

There are 29 Routers and 30 Services in Traefik.

image

All of the web entrypoints are for server3, none are from the other 3 servers. There are also two traefik entrypoints, which I think are expected.

image

All of the loadbalancer services are for server3, none are from the other 3 servers. There are also three internal services, which I think are expected.

image


Problem Definition

Expected Behaviour: Traefik to show services from all 4 servers based on the use of redis as a provider, as the redis cache has entries for services from all servers.

Apparent Behaviour: Traefik only shows services from server3.


Next Steps

Are there any log files I can provide?

Does Traefik store any cache or configuration anywhere that I should flush? I have tried flushing redis (using flushall) to repopulate the cache, but no luck there. I'm wondering if Traefik has temporary stores in docker that I should clear?

Thank you for taking the time to review this!

chetan commented 11 months ago

@instantdreams thanks for the detailed report. It looks like the redis keys for all of your services are correctly being written to redis (at least the various promtail services shown in the output above). I think the next step for you is to check the logs of the traefik service to see if it's rejecting any of the configurations for some reason. Looks like you already have it set to debug level so you should have all the information you need to see why it's not picking up the services from other hosts there.

instantdreams commented 11 months ago

This is a Traefik issue, and it's frustrating. I did the following:

The log file even ended with:

traefik  | time="2023-09-23T21:08:30-06:00" level=debug msg="Skipping unchanged configuration." providerName=redis

Where is Traefik keeping the stored configuration files or cached details?

The only mounts I have for it are:

$ docker container inspect traefik -f '{{range .Mounts}}{{.Type}}:{{.Source}}:{{.Destination}}{{println}}{{ end }}'
bind:/srv/traefik/letsencrypt:/letsencrypt
bind:/srv/acme-sh/srv:/acme-sh
bind:/etc/timezone:/etc/timezone
bind:/etc/localtime:/etc/localtime
bind:/srv/traefik/etc-traefik:/etc/traefik
bind:/srv/traefik/acme:/acme
instantdreams commented 11 months ago

Wait wait ignore that, I think I found the issue and it was my fault.

I had a redis service running on server3 and the traefik service was pointing there.

D'oh.