sablierapp / sablier

Start your containers on demand, shut them down automatically when there's no activity. Docker, Docker Swarm Mode and Kubernetes compatible.
https://sablierapp.dev/
GNU Affero General Public License v3.0
1.36k stars 46 forks source link

Dynamic status page not showing via Traefik #165

Closed dimber-cais closed 1 year ago

dimber-cais commented 1 year ago

Describe the bug When a group is scaling up and dynamic strategy is used, I expect to see the themed loading page. Instead I just see a pending request from the browser (so blank page).

Context

Expected behavior When using the following middleware:

apiVersion: traefik.containo.us/v1alpha1
kind: Middleware
metadata:
  name: sablier
spec:
  plugin:
    sablier:
      group: test
      sablierUrl: 'http://sablier.sablier.svc.cluster.local:10000'
      sessionDuration: 30s
      dynamic:
        displayName: 'test'
        showDetails: true
        theme: hacker-terminal
        refreshFrequency: 5s

I would expect to see the loading page until pods are ready. However, the request does not complete and remains pending.

Additional context

dimber-cais commented 1 year ago

Looking at the docs here: https://acouvreur.github.io/sablier/#/guides/code-server-traefik-kubernetes

It shows using Traefik plugin version v1.4.0-beta.4 but this doesn't appear to be published here. The latest is v1.4.0-beta.3.

Also it shows using docker container acouvreur/sablier:1.4.0-beta.4 which also does not exist.

acouvreur commented 1 year ago

Looking at the docs here: https://acouvreur.github.io/sablier/#/guides/code-server-traefik-kubernetes

It shows using Traefik plugin version v1.4.0-beta.4 but this doesn't appear to be published here. The latest is v1.4.0-beta.3.

Also it shows using docker container acouvreur/sablier:1.4.0-beta.4 which also does not exist.

Yes, you're right about that. The last action did not trigger the releases on tagging, I have to fix that by force publishing myself, I will do it soon.

acouvreur commented 1 year ago

Describe the bug When a group is scaling up and dynamic strategy is used, I expect to see the themed loading page. Instead I just see a pending request from the browser (so blank page).

Context

  • Sablier version: ghcr.io/acouvreur/sablier:beta
  • Provider: kubernetes 1.24
  • Reverse proxy: Traefik 2.9
  • Sablier running inside a container? Yes

Expected behavior When using the following middleware:

apiVersion: traefik.containo.us/v1alpha1
kind: Middleware
metadata:
  name: sablier
spec:
  plugin:
    sablier:
      group: test
      sablierUrl: 'http://sablier.sablier.svc.cluster.local:10000'
      sessionDuration: 30s
      dynamic:
        displayName: 'test'
        showDetails: true
        theme: hacker-terminal
        refreshFrequency: 5s

I would expect to see the loading page until pods are ready. However, the request does not complete and remains pending.

Additional context

  • Interestingly, if I port-forward to the sablier pod, and I issue a request to http://localhost:10000/api/strategies/dynamic?group=test I do see the page.
  • Aside from the loading page not showing up, the scaling up and down is working correctly.

For your issue here, I'd be curious about your Traefik setup, can you please share more of the Traefik setup itself? With Sablier, the service you try to setup for scale to zero and Traefik itself? Thanks

dimber-cais commented 1 year ago

Hi,

Here's the Traefik config:

apiVersion: traefik.containo.us/v1alpha1
kind: Middleware
metadata:
  name: sablier
  namespace: test
spec:
  plugin:
    sablier:
      group: test
      sablierUrl: 'http://sablier.sablier.svc.cluster.local:10000'
      sessionDuration: 30s
      dynamic:
        displayName: 'test'
        showDetails: true
        theme: hacker-terminal
        refreshFrequency: 5s
---
apiVersion: traefik.containo.us/v1alpha1
kind: IngressRoute
metadata:
    name: test
    namespace: test
spec:
    entryPoints:
        - https-external
    tls:
        secretName: external-tls
    routes:
        -   kind: Rule
            match: Host(`test.test.com`) && PathPrefix(`/test/v1`)
            middlewares:
              - name: sablier
                namespace: test
            services:
                -   kind: Service
                    name: test
                    namespace: test
                    port: 8080

There's nothing special about the deployment except that it initially starts out with zero replicas and has the sablier labels added

apiVersion: apps/v1
kind: Deployment
metadata:
  name: test
  namespace: test
  labels:
    app: test
    sablier.enable: "true"
    sablier.group: "test"
spec:
  replicas: 0

As you can see we utilize the Traefik IngressRoute CRD rather than the standard K8 Ingress resource. Our static configuration looks like this:

...
providers:
  kubernetesCRD:
    allowCrossNamespace: true
    allowEmptyServices: true
experimental:
  plugins:
    sablier:
      moduleName: "github.com/acouvreur/sablier"
      version: "v1.4.0-beta.3"
dimber-cais commented 1 year ago

This is the static configuration

apiVersion: v1
kind: ConfigMap
metadata:
  name: sablier-config
data:
  sablier.yaml: |
    provider:
      # Provider to use to manage containers (docker, swarm, kubernetes)
      name: kubernetes
    server:
      # The server port to use
      port: 10000
      # The base path for the API
      base-path: /
    storage:
      # File path to save the state (default stateless)
      file:
    sessions:
      # The default session duration (default 5m)
      default-duration: 15m
      # The expiration checking interval.
      # Higher duration gives less stress on CPU.
      # If you only use sessions of 1h, setting this to 5m is a good trade-off.
      expiration-interval: 60s
    logging:
      level: debug
    strategy:
      dynamic:
        # Custom themes folder, will load all .html files recursively (default empty)
        custom-themes-path:
        # Show instances details by default in waiting UI
        show-details-by-default: true
        # Default theme used for dynamic strategy (default "hacker-terminal")
        default-theme: hacker-terminal
        # Default refresh frequency in the HTML page for dynamic strategy
        default-refresh-frequency: 10s
      blocking:
        # Default timeout used for blocking strategy (default 1m)
        default-timeout: 5m

This is mounted under /sablier.yaml and the stat command is

        - name: sablier
          image: acouvreur/sablier:1.3.0
          args:
            - "start"
            - "--configFile=/sablier.yaml"
acouvreur commented 1 year ago
        - name: sablier
          image: acouvreur/sablier:1.3.0
          args:
            - "start"
            - "--configFile=/sablier.yaml"  

Here you are using version 1.3.0 which does not yet support groups, can you use the beta tag instead and try it out?

dimber-cais commented 1 year ago

Apologies copy-paste error. I tried both. I can confirm the problem exists with the beta tagged image.

patcher-ms commented 1 year ago

Hello, I am seeing the same issue too using acouvreur/sablier:1.4.0-beta.4 and I wonder if this is due to me and @dimber-cais using HTTPs.

In case it's useful, this is the output of a curl request:

curl -v https://searxng.homelab.example.com
*   Trying 192.168.4.63:443...
* Connected to searxng.homelab.example.com (192.168.4.63) port 443 (#0)
* ALPN: offers h2,http/1.1
* (304) (OUT), TLS handshake, Client hello (1):
*  CAfile: /etc/ssl/cert.pem
*  CApath: none
* (304) (IN), TLS handshake, Server hello (2):
* (304) (IN), TLS handshake, Unknown (8):
* (304) (IN), TLS handshake, Certificate (11):
* (304) (IN), TLS handshake, CERT verify (15):
* (304) (IN), TLS handshake, Finished (20):
* (304) (OUT), TLS handshake, Finished (20):
* SSL connection using TLSv1.3 / AEAD-CHACHA20-POLY1305-SHA256
* ALPN: server accepted h2
* Server certificate:
*  subject: CN=searxng.homelab.example.com
*  start date: Aug 16 23:37:57 2023 GMT
*  expire date: Nov 14 23:37:56 2023 GMT
*  subjectAltName: host "searxng.homelab.example.com" matched cert's "searxng.homelab.example.com"
*  issuer: C=US; O=Let's Encrypt; CN=R3
*  SSL certificate verify ok.
* using HTTP/2
* h2 [:method: GET]
* h2 [:scheme: https]
* h2 [:authority: searxng.homelab.example.com]
* h2 [:path: /]
* h2 [user-agent: curl/8.1.2]
* h2 [accept: */*]
* Using Stream ID: 1 (easy handle 0x13000a800)
> GET / HTTP/2
> Host: searxng.homelab.example.com
> User-Agent: curl/8.1.2
> Accept: */*
> 
< HTTP/2 200 
* HTTP/2 stream 1 was reset
* Connection #0 to host searxng.homelab.example.com left intact
curl: (56) HTTP/2 stream 1 was reset

If I change the service to use HTTP instead of HTTPs, I get the correct dynamic loading page and then I get the correct redirect.

Given everything else works it feels like the issue could be around this call to forward https://github.com/acouvreur/sablier/blob/98f0f81894b7c724590df19ef89e94c15c4bbb92/plugins/traefik/main.go#L43 but I am not familiar enough with Go nor Traefik to understand what the issue might be.

acouvreur commented 1 year ago

:tada: This issue has been resolved in version 1.4.0-beta.7 :tada:

The release is available on:

Your semantic-release bot :package::rocket:

acouvreur commented 1 year ago

:tada: This issue has been resolved in version 1.4.0 :tada:

The release is available on:

Your semantic-release bot :package::rocket: