supabase-community / supabase-on-aws

Self-hosted Supabase on AWS
Apache License 2.0
417 stars 61 forks source link

Supabase Realtime does not works #1

Closed mats16 closed 2 years ago

mats16 commented 2 years ago

次のようなエラーが出て動作しない。

2022-08-01 05:55:37.522 [error] %Postgrex.Error{connection_id: 16700, message: nil, postgres: %{code: :object_in_use, file: "slot.c", line: "464", message: "replication slot \"supabase_realtime_rls\" is active for PID 12303", pg_code: "55006", routine: "ReplicationSlotAcquire", severity: "ERROR", unknown: "ERROR"}, query: nil}

この状態でつなぎに行くと

2022-08-01 05:56:11.784 [info] CONNECTED TO RealtimeWeb.UserSocket in 245µs
  Transport: :websocket
  Serializer: Phoenix.Socket.V1.JSONSerializer
  Parameters: %{"apikey" => "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJyb2xlIjoiYW5vbiIsImlhdCI6MTY1ODk2NzE2NiwiZXhwIjoxOTc0NTQzMTY2LCJpc3MiOiJzdXBhYmFzZSJ9.yjbFON28xWRS-U4GhOtIvkUD0gjEr4NmKb7Kzgw-XI8", "vsn" => "1.0.0"}

2022-08-01 05:56:14.826 [error] %RuntimeError{message: "operation :rollback is manually rolling back, which is not supported by Ecto.Multi"}

2022-08-01 05:56:14.826 [info] REFUSED JOIN realtime:public:messages in 18ms
  Parameters: %{"user_token" => "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJyb2xlIjoiYW5vbiIsImlhdCI6MTY1ODk2NzE2NiwiZXhwIjoxOTc0NTQzMTY2LCJpc3MiOiJzdXBhYmFzZSJ9.yjbFON28xWRS-U4GhOtIvkUD0gjEr4NmKb7Kzgw-XI8"}

2022-08-01 05:56:14.845 [error] %Postgrex.Error{connection_id: 16700, message: nil, postgres: %{code: :undefined_object, file: "slot.c", line: "418", message: "replication slot \"supabase_realtime_rls\" does not exist", pg_code: "42704", routine: "ReplicationSlotAcquire", severity: "ERROR", unknown: "ERROR"}, query: nil}
mats16 commented 2 years ago

https://github.com/supabase/realtime/issues/22#issuecomment-681854208

This is an ongoing problem with, for example, ECS tasks in AWS, when the new one has started before the old one is killed. The old one hasn't released the replication slot id, while the new one is trying to claim it.

ECS のローリングアップデートだと同一スロットで複数接続するので対応が必要そう

mats16 commented 2 years ago

ローリングアップデートの際のエラーはコレっぽい connection_id: 30689 で接続に行って、過去の active for PID 14208 が残ってるっぽい

2022-08-01 21:29:33.460 [error] %Postgrex.Error{connection_id: 30689, message: nil, postgres: %{code: :object_in_use, file: "slot.c", line: "464", message: "replication slot \"realtime_rls\" is active for PID 14208", pg_code: "55006", routine: "ReplicationSlotAcquire", severity: "ERROR", unknown: "ERROR"}, query: nil}

この直後にコレが出てる(進捗なし)

2022-08-01 21:32:54.929 [error] %RuntimeError{message: "operation :rollback is manually rolling back, which is not supported by Ecto.Multi"}

2022-08-01 21:32:55.058 [error] %Postgrex.Error{connection_id: 30689, message: nil, postgres: %{code: :undefined_object, file: "slot.c", line: "418", message: "replication slot \"realtime_rls\" does not exist", pg_code: "42704", routine: "ReplicationSlotAcquire", severity: "ERROR", unknown: "ERROR"}, query: nil}
mats16 commented 2 years ago

Supabase 本家の権限も一応確認してみたが、見た感じ supabase_admin が Superuser で、postgres はユーザー用に用意されたメンテ用ユーザーっぽい

postgres=> \du
                                                             List of roles
       Role name        |                         Attributes                         |                    Member of
------------------------+------------------------------------------------------------+--------------------------------------------------
 anon                   | No inheritance, Cannot login                               | {}
 authenticated          | No inheritance, Cannot login                               | {}
 authenticator          | No inheritance                                             | {anon,authenticated,service_role,supabase_admin}
 dashboard_user         | Create role, Create DB, Cannot login, Replication          | {}
 pgbouncer              |                                                            | {}
 postgres               | Create role, Create DB, Replication, Bypass RLS            | {supabase_auth_admin,supabase_storage_admin}
 service_role           | No inheritance, Cannot login, Bypass RLS                   | {}
 supabase_admin         | Superuser, Create role, Create DB, Replication, Bypass RLS | {}
 supabase_auth_admin    | No inheritance, Create role                                | {}
 supabase_storage_admin | No inheritance, Create role
mats16 commented 2 years ago

https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/AuroraPostgreSQL.Replication.Logical.html#AuroraPostgreSQL.Replication.Logical.Configure

Ensure that max_worker_processes is at least as high as the combined values of max_logical_replication_workers, autovacuum_max_workers, and max_parallel_workers. Having a high number of background worker processes might affect application workloads on small DB instance classes, so monitor the performance of your database if you set max_worker_processes higher than the default value.

max_worker_processes: 8

max_logical_replication_workers: 2
autovacuum_max_workers: 3
max_parallel_workers: 8
mats16 commented 2 years ago
mats16 commented 2 years ago

pg_receivewal 繋いでみて状況把握しようとするも別のエラーで出来ず。

$ /usr/pgsql-14/bin/pg_receivewal -D ./pg_receivewal/ --slot=cloud9 --create-slot -d postgresql://supabase_admin:xxxx@xxxx.us-west-2.rds.amazonaws.com/postgres

pg_receivewal: 
error: 
connection to server at "xxxx.us-west-2.rds.amazonaws.com" (10.0.174.91), port 5432 failed: 
FATAL:  no pg_hba.conf entry for replication connection from host "10.0.45.110", user "supabase_admin", SSL encryption
connection to server at "xxxx.us-west-2.rds.amazonaws.com" (10.0.174.91), port 5432 failed: 
FATAL:  no pg_hba.conf entry for replication connection from host "10.0.45.110", user "supabase_admin", no encryption
mats16 commented 2 years ago

Websocker の接続の際に次のようなエラーが出てたので調査

{
  "event":"phx_reply",
  "payload":{
    "response":{
      "reason":"error occurred when joining realtime:public:messages with user token"
    },
    "status":"error"
  },
  "ref":"1",
  "topic":"realtime:public:messages"
}

ローカル(Mac)で自分で作成した jwt_secret に置き換えて起動

問題なく動く。JWT のフォーマット起因では無さそう

Fargate の起動モードを x86 に変更

変わらず

Cloud9 上で docker-compose して DB は Aurora

同じエラー。異なるのは DB だけなので DB 起因??

mats16 commented 2 years ago

https://github.com/supabase/realtime/issues/270

原因特定してパッチ送った