Closed mats16 closed 2 years ago
https://github.com/supabase/realtime/issues/22#issuecomment-681854208
This is an ongoing problem with, for example, ECS tasks in AWS, when the new one has started before the old one is killed. The old one hasn't released the replication slot id, while the new one is trying to claim it.
ECS のローリングアップデートだと同一スロットで複数接続するので対応が必要そう
ローリングアップデートの際のエラーはコレっぽい
connection_id: 30689
で接続に行って、過去の active for PID 14208
が残ってるっぽい
2022-08-01 21:29:33.460 [error] %Postgrex.Error{connection_id: 30689, message: nil, postgres: %{code: :object_in_use, file: "slot.c", line: "464", message: "replication slot \"realtime_rls\" is active for PID 14208", pg_code: "55006", routine: "ReplicationSlotAcquire", severity: "ERROR", unknown: "ERROR"}, query: nil}
この直後にコレが出てる(進捗なし)
2022-08-01 21:32:54.929 [error] %RuntimeError{message: "operation :rollback is manually rolling back, which is not supported by Ecto.Multi"}
2022-08-01 21:32:55.058 [error] %Postgrex.Error{connection_id: 30689, message: nil, postgres: %{code: :undefined_object, file: "slot.c", line: "418", message: "replication slot \"realtime_rls\" does not exist", pg_code: "42704", routine: "ReplicationSlotAcquire", severity: "ERROR", unknown: "ERROR"}, query: nil}
Supabase 本家の権限も一応確認してみたが、見た感じ supabase_admin が Superuser で、postgres はユーザー用に用意されたメンテ用ユーザーっぽい
postgres=> \du
List of roles
Role name | Attributes | Member of
------------------------+------------------------------------------------------------+--------------------------------------------------
anon | No inheritance, Cannot login | {}
authenticated | No inheritance, Cannot login | {}
authenticator | No inheritance | {anon,authenticated,service_role,supabase_admin}
dashboard_user | Create role, Create DB, Cannot login, Replication | {}
pgbouncer | | {}
postgres | Create role, Create DB, Replication, Bypass RLS | {supabase_auth_admin,supabase_storage_admin}
service_role | No inheritance, Cannot login, Bypass RLS | {}
supabase_admin | Superuser, Create role, Create DB, Replication, Bypass RLS | {}
supabase_auth_admin | No inheritance, Create role | {}
supabase_storage_admin | No inheritance, Create role
Ensure that max_worker_processes is at least as high as the combined values of max_logical_replication_workers, autovacuum_max_workers, and max_parallel_workers. Having a high number of background worker processes might affect application workloads on small DB instance classes, so monitor the performance of your database if you set max_worker_processes higher than the default value.
今
max_worker_processes: 8
max_logical_replication_workers: 2
autovacuum_max_workers: 3
max_parallel_workers: 8
pg_receivewal
繋いでみて状況把握しようとするも別のエラーで出来ず。
$ /usr/pgsql-14/bin/pg_receivewal -D ./pg_receivewal/ --slot=cloud9 --create-slot -d postgresql://supabase_admin:xxxx@xxxx.us-west-2.rds.amazonaws.com/postgres
pg_receivewal:
error:
connection to server at "xxxx.us-west-2.rds.amazonaws.com" (10.0.174.91), port 5432 failed:
FATAL: no pg_hba.conf entry for replication connection from host "10.0.45.110", user "supabase_admin", SSL encryption
connection to server at "xxxx.us-west-2.rds.amazonaws.com" (10.0.174.91), port 5432 failed:
FATAL: no pg_hba.conf entry for replication connection from host "10.0.45.110", user "supabase_admin", no encryption
Websocker の接続の際に次のようなエラーが出てたので調査
{
"event":"phx_reply",
"payload":{
"response":{
"reason":"error occurred when joining realtime:public:messages with user token"
},
"status":"error"
},
"ref":"1",
"topic":"realtime:public:messages"
}
問題なく動く。JWT のフォーマット起因では無さそう
変わらず
同じエラー。異なるのは DB だけなので DB 起因??
https://github.com/supabase/realtime/issues/270
原因特定してパッチ送った
次のようなエラーが出て動作しない。
この状態でつなぎに行くと