sablierapp / sablier

Start your containers on demand, shut them down automatically when there's no activity. Docker, Docker Swarm Mode and Kubernetes compatible.
https://sablierapp.dev/
GNU Affero General Public License v3.0
1.36k stars 46 forks source link

Bad Gateway after Loading Page #120

Closed matt-laird closed 1 year ago

matt-laird commented 1 year ago

Describe the bug When target container(s) eventually start, the browser gets redirected to Treafik's Bad Gateway error page. Upon reloading a few seconds after that, the destination page loads in fine.

Context

Expected behaviour After waiting for configured containers to start up, the redirect should take you to your destination page without getting any errors from Treafik.

Additional context

acouvreur commented 1 year ago

With the traefik middleware plugin you can actually configure the refresh frequency. Example

labels:
  - traefik.http.middlewares.my-sablier.plugin.dynamic.refreshFrequency=5s 

The issue you state is a known issue with Traefik in certain configuration.

Can you please share in detail your traefik configuration integration with Sablier? Thanks

matt-laird commented 1 year ago

I did broaden the refreshFrequency option before raising this issue, but it would yield the same results unless I put it up to a very high value, which would obviously result in a much longer waiting time.

I'm really glad you mention that the issue is known, I thought I might be going crazy since I did check for other issues and I couldn't find any matching the symptoms.

Sablier Service in Compose:

ondemand-services:
  container_name: 'traefik_ondemand-api'
  image: 'acouvreur/sablier:1.2.0'
  environment:
    - 'PROVIDER_NAME=docker'
    - 'SERVER_PORT=10000'
    - 'SESSIONS_DEFAULT_DURATION=1h'
    - 'SESSIONS_EXPIRATION_INTERVAL=2m'
    - 'STORAGE_FILE=/state/ondemand-state.json'
    - 'STRATEGY_BLOCKING_DEFAULT_TIMEOUT=1m'
    - 'STRATEGY_DYNAMIC_DEFAULT_REFRESH_FREQUENCY=5s'
    - 'STRATEGY_DYNAMIC_DEFAULT_THEME=ghost'
  volumes:
    - '/var/run/docker.sock:/var/run/docker.sock'
    - '/home/owner/config/traefik/internal/ondemand-state.json:/state/ondemand-state.json'
  networks:
    - 'traefik_internal'
  depends_on:
    - 'reverse-proxy'
  restart: 'unless-stopped'
  labels:
    - 'traefik.enable=false'

Treafik Dynamic Config for instance of Outline:

http:
  routers:
    notes:
      entryPoints:
        - 'https'
      rule: 'Host(`my.domain`)'
      service: 'notes'
      middlewares:
        - 'notes-ondemand@file'
        - 'default-secure-headers@file'
  middlewares:
    notes-ondemand:
      plugin:
        sablier:
          dynamic:
            displayName: 'Outline'
          names: 'outline_server'
          sablierUrl: 'http://traefik_ondemand-api:10000'
          sessionDuration: '30m'
  services:
    notes:
      loadBalancer:
        servers:
          - url: 'http://outline_server:3000'
acouvreur commented 1 year ago

Ok, from this configuration, everything looks great.

However, docker might inform that the container is in a "running" state before the server is able to answer requests.

Do you have more information on the container "outline_server" ?

You should be defining a docker healthcheck to ensure that the container is able to serve requests before routing incoming requests to it. It could be indeed caused to this. image

And it would behave the same if you'd start the container yourself and immediately tried to reach the route (without the sablier service of course).

If this is indeed the reason why you encountered a 502 Bad Gateway. Then I'll add some "Troubleshoot" documentation.

Please let me know if it works well after adding a docker healtheck to your container.

matt-laird commented 1 year ago

Managed to get some time to take another stab at this and after reading over your suggestion around a healthcheck, it clicked. That makes perfect sense, duh.

I added a healthcheck to the outline_server and it works well! Thanks! Maybe it's worth mentioning somewhere in the documentation, as a heads-up to Traefik users. For those that stumble upon this later, here is what I added:

version: '3.8'
services:
  server:
    container_name: 'outline_server'
    image: 'outlinewiki/outline:0.66.3'
    command: 'sh -c "yarn db:migrate --env=production-ssl-disabled && yarn start"'
    networks:
      - 'public'
      - 'outline_internal'
    ## -- Added below -- ##
    healthcheck:
      test: 'sh -c "wget --no-verbose --tries=1 --spider localhost:3000 || exit 1"'
      start_period: '30s'
      interval: '10s'
      timeout: '5s'
    ## -- -- ##
    env_file: '/home/owner/config/outline/config/.env'
    depends_on:
      - 'db'
      - 'cache'
    restart: 'unless-stopped'
    labels:
      - 'traefik.enable=false'