supabase / realtime

Broadcast, Presence, and Postgres Changes via WebSockets
https://supabase.com/realtime
Apache License 2.0
6.56k stars 289 forks source link

Realtime: Self Hosting - Docker Swarm mode #645

Open lwjameson opened 10 months ago

lwjameson commented 10 months ago

Bug report

Describe the bug

We have a project destined for open source project targeted at colleges and universities which our client requires be appropriate for self-hosting. Although the docker-compose version is fine, we really do not feel it is a viable production platform, and do not feel that a Kubernetes based solution is achievable for many institutions . I have been working on a Docker swarm implementation and am very close, with the last remaining issue being realtime.

The realtime service is driven by the following compose yaml file:

# Starts the client service for a Supabase install

version: "3.8"

services:
  realtime-dev:
    image: supabase/realtime:v2.10.1
    healthcheck:
      test:
        [
          "CMD",
          "bash",
          "-c",
          "printf \\0 > /dev/tcp/localhost/4000"
        ]
      timeout: 5s
      interval: 5s
      retries: 3
    ulimits:
      nofile:
        soft: 100000
        hard: 200000
    networks:
      - supabase
    environment:
      - PORT=4000
      - DB_HOST=supabase-db_db
      - DB_PORT=5432
      - DB_USER=supabase_admin
      - DB_PASSWORD=<redacted>
      - DB_NAME=postgres
      - DB_ENC_KEY=supabaserealtime
      - API_JWT_SECRET=<redacted>
      - FLY_ALLOC_ID=fly123
      - FLY_APP_NAME=realtime-dev
      - SECRET_KEY_BASE=UpNVntn3cDxHJpq99YMc1T1AQgQpc8kfYTuRgBiYa15BLrx8etQoXz3gZv1/u2oq
      - ERL_AFLAGS=-proto_dist inet_tcp
      - ENABLE_TAILSCALE=false
      - DNS_NODES=''
      - DB_AFTER_CONNECT_QUERY=SET search_path TO _realtime
    command: >
      sh -c "/app/bin/migrate && /app/bin/realtime eval 'Realtime.Release.seeds(Realtime.Repo)' && /app/bin/server"
networks:
  supabase:
    name: supabase-test
    external: true

My kong.yml file is:

<snip>
  ## Secure Realtime routes
  - name: realtime-v1
    _comment: 'Realtime: /realtime/v1/* -> ws://realtime:4000/socket/*'
    url: http://realtime-dev:4000/socket/
    routes:
      - name: realtime-v1-all
        strip_path: true
        paths:
          - /realtime/v1/
    plugins:
      - name: cors
      - name: key-auth
        config:
          hide_credentials: false
      - name: acl
        config:
          hide_groups_header: true
          allow:
            - admin
            - anon

<snip>

When our client (which works against hosted supabase) connects via wss we see the following output from the realtime logs:

20:36:05.994 [debug] QUERY OK source="tenants" db=0.4ms idle=1099.6ms
SELECT t0."id", t0."name", t0."external_id", t0."jwt_secret", t0."postgres_cdc_default", t0."max_concurrent_users", t0."max_events_per_second", t0."max_bytes_per_second", t0."max_channels_per_client", t0."max_joins_per_second", t0."inserted_at", t0."updated_at" FROM "tenants" AS t0 WHERE (t0."external_id" = $1) ["realtime-dev"]
20:36:05.994 [debug] QUERY OK source="extensions" db=0.3ms idle=1100.2ms
SELECT e0."id", e0."type", e0."settings", e0."tenant_external_id", e0."inserted_at", e0."updated_at", e0."tenant_external_id" FROM "extensions" AS e0 WHERE (e0."tenant_external_id" = $1) ORDER BY e0."tenant_external_id" ["realtime-dev"]
20:36:36.530 [debug] QUERY OK source="tenants" db=0.5ms idle=1635.7ms
SELECT t0."id", t0."name", t0."external_id", t0."jwt_secret", t0."postgres_cdc_default", t0."max_concurrent_users", t0."max_events_per_second", t0."max_bytes_per_second", t0."max_channels_per_client", t0."max_joins_per_second", t0."inserted_at", t0."updated_at" FROM "tenants" AS t0 WHERE (t0."external_id" = $1) ["realtime-dev"]
20:36:36.530 [debug] QUERY OK source="extensions" db=0.4ms idle=636.4ms
SELECT e0."id", e0."type", e0."settings", e0."tenant_external_id", e0."inserted_at", e0."updated_at", e0."tenant_external_id" FROM "extensions" AS e0 WHERE (e0."tenant_external_id" = $1) ORDER BY e0."tenant_external_id" ["realtime-dev"]
20:37:07.750 [debug] QUERY OK source="tenants" db=0.5ms idle=1856.1ms
SELECT t0."id", t0."name", t0."external_id", t0."jwt_secret", t0."postgres_cdc_default", t0."max_concurrent_users", t0."max_events_per_second", t0."max_bytes_per_second", t0."max_channels_per_client", t0."max_joins_per_second", t0."inserted_at", t0."updated_at" FROM "tenants" AS t0 WHERE (t0."external_id" = $1) ["realtime-dev"]
20:37:07.751 [debug] QUERY OK source="extensions" db=0.4ms idle=1856.9ms
SELECT e0."id", e0."type", e0."settings", e0."tenant_external_id", e0."inserted_at", e0."updated_at", e0."tenant_external_id" FROM "extensions" AS e0 WHERE (e0."tenant_external_id" = $1) ORDER BY e0."tenant_external_id" ["realtime-dev"]

The output to the kong logs 👍

10.0.0.2 - - [22/Aug/2023:20:36:57 +0000] "GET /realtime/v1/websocket?apikey=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.ewogICAgInJvbGUiOiAiYW5vbiIsCiAgICAiaXNzIjogInN1cGFiYXNlIiwKICAgICJpYXQiOiAxNjg5ODI1NjAwLAogICAgImV4cCI6IDE4NDc2Nzg0MDAKfQ.4owCHkjcaa6-TZh86JXCI2Wxp8SBvyYoBTF9NeHVZ7M&eventsPerSecond=10&vsn=1.0.0 HTTP/1.0" 426 0 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/115.0.0.0 Safari/537.36"
10.0.0.2 - - [22/Aug/2023:20:37:01 +0000] "GET /realtime/v1/websocket?apikey=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.ewogICAgInJvbGUiOiAiYW5vbiIsCiAgICAiaXNzIjogInN1cGFiYXNlIiwKICAgICJpYXQiOiAxNjg5ODI1NjAwLAogICAgImV4cCI6IDE4NDc2Nzg0MDAKfQ.4owCHkjcaa6-TZh86JXCI2Wxp8SBvyYoBTF9NeHVZ7M&eventsPerSecond=10&vsn=1.0.0 HTTP/1.0" 426 0 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/115.0.0.0 Safari/537.36"
10.0.0.2 - - [22/Aug/2023:20:37:07 +0000] "GET /realtime/v1/websocket?apikey=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.ewogICAgInJvbGUiOiAiYW5vbiIsCiAgICAiaXNzIjogInN1cGFiYXNlIiwKICAgICJpYXQiOiAxNjg5ODI1NjAwLAogICAgImV4cCI6IDE4NDc2Nzg0MDAKfQ.4owCHkjcaa6-TZh86JXCI2Wxp8SBvyYoBTF9NeHVZ7M&eventsPerSecond=10&vsn=1.0.0 HTTP/1.0" 426 0 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/115.0.0.0 Safari/537.36"
10.0.0.2 - - [22/Aug/2023:20:37:13 +0000] "GET /realtime/v1/websocket?apikey=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.ewogICAgInJvbGUiOiAiYW5vbiIsCiAgICAiaXNzIjogInN1cGFiYXNlIiwKICAgICJpYXQiOiAxNjg5ODI1NjAwLAogICAgImV4cCI6IDE4NDc2Nzg0MDAKfQ.4owCHkjcaa6-TZh86JXCI2Wxp8SBvyYoBTF9NeHVZ7M&eventsPerSecond=10&vsn=1.0.0 HTTP/1.0" 426 0 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/115.0.0.0 Safari/537.36"

To Reproduce

requires swarm setup

Expected behavior

I expect to have my websocket upgraded, but that appears to be failing

System information

Additional context

Thank you for any help you can provide here. I have been struggling with this all day.

lwjameson commented 10 months ago

OK. I have made it a little farther but am still having issues.

the dock-compose.yml file uses:

container_name: realtime-dev.supabase-realtime

In order to address Realtime's use of sub-domain to determine tenants as discussed here.

container_name is ignored by swarm mode, and it is not possible to create a docker network name that will be usable by Realtime.

To get around this I have updated my NGINX config as follows:

   server {
        server_name realtime-dev.<domain>.org;
        location / {
                proxy_pass http://127.0.0.1:4000;
                proxy_http_version 1.1;
                proxy_set_header Upgrade $http_upgrade;
                proxy_set_header Connection "upgrade";
                proxy_read_timeout 86400;
        }

and my kong.yml:

  - name: realtime-v1
    _comment: 'Realtime: /realtime/v1/* -> ws://realtime:4000/socket/*'
    url: http://realtime-dev.<domain>.org/socket/
    routes:
      - name: realtime-v1-all
        strip_path: true
        paths:
          - /realtime/v1/
    plugins:
      - name: cors
      - name: key-auth
        config:
          hide_credentials: false
      - name: acl
        config:
          hide_groups_header: true
          allow:
            - admin
            - anon

It started looking for a tenant named '127' sigh, which I added to the system via the instructions here. I will figure out this later...

I am now reaching the Realtime service and getting these logs:

18:17:14.784 [debug] QUERY OK source="tenants" db=0.5ms idle=1417.3ms
SELECT t0."id", t0."name", t0."external_id", t0."jwt_secret", t0."postgres_cdc_default", t0."max_concurrent_users", t0."max_events_per_second", t0."max_bytes_per_second", t0."max_channels_per_client", t0."max_joins_per_second", t0."inserted_at", t0."updated_at" FROM "tenants" AS t0 WHERE (t0."external_id" = $1) ["127"]
18:17:14.785 [debug] QUERY OK source="extensions" db=0.3ms idle=1418.1ms
SELECT e0."id", e0."type", e0."settings", e0."tenant_external_id", e0."inserted_at", e0."updated_at", e0."tenant_external_id" FROM "extensions" AS e0 WHERE (e0."tenant_external_id" = $1) ORDER BY e0."tenant_external_id" ["127"]
18:17:47.379 [debug] QUERY OK source="tenants" db=0.5ms idle=1012.2ms
SELECT t0."id", t0."name", t0."external_id", t0."jwt_secret", t0."postgres_cdc_default", t0."max_concurrent_users", t0."max_events_per_second", t0."max_bytes_per_second", t0."max_channels_per_client", t0."max_joins_per_second", t0."inserted_at", t0."updated_at" FROM "tenants" AS t0 WHERE (t0."external_id" = $1) ["127"]
18:17:47.380 [debug] QUERY OK source="extensions" db=0.6ms idle=1013.0ms
SELECT e0."id", e0."type", e0."settings", e0."tenant_external_id", e0."inserted_at", e0."updated_at", e0."tenant_external_id" FROM "extensions" AS e0 WHERE (e0."tenant_external_id" = $1) ORDER BY e0."tenant_external_id" ["127"]
18:18:19.384 [debug] QUERY OK source="tenants" db=0.6ms idle=16.5ms
SELECT t0."id", t0."name", t0."external_id", t0."jwt_secret", t0."postgres_cdc_default", t0."max_concurrent_users", t0."max_events_per_second", t0."max_bytes_per_second", t0."max_channels_per_client", t0."max_joins_per_second", t0."inserted_at", t0."updated_at" FROM "tenants" AS t0 WHERE (t0."external_id" = $1) ["127"]
18:18:19.384 [debug] QUERY OK source="extensions" db=0.3ms idle=17.4ms
SELECT e0."id", e0."type", e0."settings", e0."tenant_external_id", e0."inserted_at", e0."updated_at", e0."tenant_external_id" FROM "extensions" AS e0 WHERE (e0."tenant_external_id" = $1) ORDER BY e0."tenant_external_id" ["127"]

Which is giving me a 400 error from Kong:

10.0.0.2 - - [23/Aug/2023:18:19:10 +0000] "GET /realtime/v1/websocket?apikey=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.ewogICAgInJvbGUiOiAiYW5vbiIsCiAgICAiaXNzIjogInN1cGFiYXNlIiwKICAgICJpYXQiOiAxNjg5ODI1NjAwLAogICAgImV4cCI6IDE4NDc2Nzg0MDAKfQ.4owCHkjcaa6-TZh86JXCI2Wxp8SBvyYoBTF9NeHVZ7M&eventsPerSecond=10&vsn=1.0.0 HTTP/1.0" 400 0 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/115.0.0.0 Safari/537.36"
10.0.0.2 - - [23/Aug/2023:18:19:14 +0000] "GET /realtime/v1/websocket?apikey=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.ewogICAgInJvbGUiOiAiYW5vbiIsCiAgICAiaXNzIjogInN1cGFiYXNlIiwKICAgICJpYXQiOiAxNjg5ODI1NjAwLAogICAgImV4cCI6IDE4NDc2Nzg0MDAKfQ.4owCHkjcaa6-TZh86JXCI2Wxp8SBvyYoBTF9NeHVZ7M&eventsPerSecond=10&vsn=1.0.0 HTTP/1.0" 400 0 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/115.0.0.0 Safari/537.36"
10.0.0.2 - - [23/Aug/2023:18:19:20 +0000] "GET /realtime/v1/websocket?apikey=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.ewogICAgInJvbGUiOiAiYW5vbiIsCiAgICAiaXNzIjogInN1cGFiYXNlIiwKICAgICJpYXQiOiAxNjg5ODI1NjAwLAogICAgImV4cCI6IDE4NDc2Nzg0MDAKfQ.4owCHkjcaa6-TZh86JXCI2Wxp8SBvyYoBTF9NeHVZ7M&eventsPerSecond=10&vsn=1.0.0 HTTP/1.0" 400 0 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/115.0.0.0 Safari/537.36"
10.0.0.2 - - [23/Aug/2023:18:19:26 +0000] "GET /realtime/v1/websocket?apikey=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.ewogICAgInJvbGUiOiAiYW5vbiIsCiAgICAiaXNzIjogInN1cGFiYXNlIiwKICAgICJpYXQiOiAxNjg5ODI1NjAwLAogICAgImV4cCI6IDE4NDc2Nzg0MDAKfQ.4owCHkjcaa6-TZh86JXCI2Wxp8SBvyYoBTF9NeHVZ7M&eventsPerSecond=10&vsn=1.0.0 HTTP/1.0" 400 0 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/115.0.0.0 Safari/537.36"
10.0.0.2 - - [23/Aug/2023:18:19:31 +0000] "GET /realtime/v1/websocket?apikey=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.ewogICAgInJvbGUiOiAiYW5vbiIsCiAgICAiaXNzIjogInN1cGFiYXNlIiwKICAgICJpYXQiOiAxNjg5ODI1NjAwLAogICAgImV4cCI6IDE4NDc2Nzg0MDAKfQ.4owCHkjcaa6-TZh86JXCI2Wxp8SBvyYoBTF9NeHVZ7M&eventsPerSecond=10&vsn=1.0.0 HTTP/1.0" 400 0 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/115.0.0.0 Safari/537.36"

I am not sure what I am doing wrong in terms of the proxy pass to get a 400.

Again, any help would be greatly appreciated.

filipecabaco commented 7 months ago

Be aware that we use the "sub domain" as a way to understand what is the tenant being accessed so I'm not sure if that nginx conf removes that information.

menasheh commented 6 months ago

@lwjameson did you ever scale past 1 node?

lwjameson commented 6 months ago

@menasheh Yes I did, but the client changed strategies and went with Kubernettes instead. Docker Swarm was very tricky but it does work.

menasheh commented 6 months ago

@lwjameson How did you get the elixir nodes to connect?

lwjameson commented 6 months ago

It has been a little bit, but I believe I got it to work by using a separate stack for realtime with the service named 'supabase-realtime'. Then when creating the stack I named it 'realtime-dev', and then updated the kong.yml file route for realtime to:

url: http://realtime-dev.supabase-realtime:4000/socket/

ConProgramming commented 1 month ago

@menasheh if you figured this out would love to hear, running into similar on AWS

tanmoysrt commented 1 week ago

It has been a little bit, but I believe I got it to work by using a separate stack for realtime with the service named 'supabase-realtime'. Then when creating the stack I named it 'realtime-dev', and then updated the kong.yml file route for realtime to:

url: http://realtime-dev.supabase-realtime:4000/socket/

Thanks, this works out.

I have created a service with name realtime-dev, and in kong.yaml using these urls http://realtime-dev:4000/socket for realtime-v1-ws and http://realtime-dev:4000/api for realtime-v1-rest