supabase / cli

Supabase CLI. Manage postgres migrations, run Supabase locally, deploy edge functions. Postgres backups. Generating types from your database schema.
https://supabase.com/docs/reference/cli/about
MIT License
1.07k stars 209 forks source link

Pooler container error on start: `Tenant or user not found` #2440

Closed bombillazo closed 4 months ago

bombillazo commented 4 months ago

Describe the bug A clear and concise description of what the bug is.

To Reproduce Steps to reproduce the behavior:

  1. Run supabase start
  2. pooler image fails
2024-06-18 14:22:00 Setting RLIMIT_NOFILE to 100000
2024-06-18 14:22:00 18:22:00.981 [info] Migrations already up
2024-06-18 14:22:01 18:22:01.293 [notice] [description: 'Partisan node name generated and configured', disterl_enabled: false, name: :"a7f8e05c-2d9f-11ef-9111-b453f023eab8@6f09855866bf"]
2024-06-18 14:22:01 18:22:01.303 [notice] [addr: {0, 0, 0, 0, 0, 0, 0, 1}, description: 'Resolved IP address for host', family: :inet6, host: '6f09855866bf']
2024-06-18 14:22:01 18:22:01.314 [info] [description: 'Not using container orchestration; disabling.']
2024-06-18 14:22:01 18:22:01.385 [info] [description: 'Starting Partisan listener', listen_addrs: [%{ip: {127, 0, 0, 1}, port: 20100}]]
2024-06-18 14:22:01 18:22:01.800 region=local [info] Elixir.Supavisor.SignalHandler is being initialized...
2024-06-18 14:22:01 18:22:01.811 region=local [warning] Proxy started transaction on port 6543, result: {:ok, #PID<0.396.0>}
2024-06-18 14:22:01 18:22:01.811 region=local [warning] Proxy started session on port 5432, result: {:ok, #PID<0.499.0>}
2024-06-18 14:22:01 18:22:01.814 region=local [notice] SYN[nonode@nohost] Adding node to scope <tenants>
2024-06-18 14:22:01 18:22:01.817 region=local [notice] SYN[nonode@nohost] Creating tables for scope <tenants>
2024-06-18 14:22:01 18:22:01.820 region=local [notice] SYN[nonode@nohost|registry<tenants>] Discovering the cluster
2024-06-18 14:22:01 18:22:01.821 region=local [notice] SYN[nonode@nohost|pg<tenants>] Discovering the cluster
2024-06-18 14:22:01 18:22:01.977 region=local [info] Running SupavisorWeb.Endpoint with cowboy 2.10.0 at 0.0.0.0:4000 (http)
2024-06-18 14:22:01 18:22:01.979 region=local [info] Access SupavisorWeb.Endpoint at http://localhost:4000
2024-06-18 14:22:02 ** (MatchError) no match of right hand side value: {:error, #Ecto.Changeset<action: :insert, changes: %{allow_list: ["0.0.0.0/0", "::/0"], auth_query: "SELECT * FROM pgbouncer.get_auth($1)", client_heartbeat_interval: 60, client_idle_timeout: 0, db_database: "postgres", db_host: "supabase_db_app", db_port: 5432, default_max_clients: 1000, default_parameter_status: %{"server_version" => "15.1 (Ubuntu 15.1-1.pgdg20.04+1)"}, default_pool_size: 20, enforce_ssl: false, external_id: "pooler-dev", ip_version: :auto, require_user: false, upstream_ssl: false, users: [#Ecto.Changeset<action: :insert, changes: %{db_password: "postgres", db_user: "pgbouncer", db_user_alias: "pgbouncer", is_manager: true, mode_type: :transaction, pool_size: 20}, errors: [], data: #Supavisor.Tenants.User<>, valid?: true>]}, errors: [external_id: {"has already been taken", [constraint: :unique, constraint_name: "tenants_external_id_index"]}], data: #Supavisor.Tenants.Tenant<>, valid?: false>}
2024-06-18 14:22:02     (stdlib 4.3) erl_eval.erl:496: :erl_eval.expr/6
2024-06-18 14:22:02     nofile:28: (file)
2024-06-18 14:22:02     (elixir 1.14.3) lib/code.ex:425: Code.validated_eval_string/3
2024-06-18 14:22:02 [os_mon] cpu supervisor port (cpu_sup): Erlang has closed
2024-06-18 14:22:02 [os_mon] memory supervisor port (memsup): Erlang has closed

Expected behavior All stack including pooler start up succesfully

System information

bombillazo commented 4 months ago

When I try to connect to the database I get this error in the Pooler image container:

2024-06-19 10:58:07 14:58:07.673 region=local [error] ClientHandler: User not found: "Either external_id or sni_hostname must be provided" {:single, "postgres", nil}
2024-06-19 10:58:07 14:58:07.673 region=local [warning] client_join is called with a mismatched id: nil
2024-06-19 10:58:07 14:58:07.674 region=local [warning] ClientHandler: socket closed with reason {:shutdown, :user_not_found}
bombillazo commented 4 months ago

I had to revert to CLI v1.172.2 to avoid this issue

sweatybridge commented 4 months ago

The original issue is caused by repeatedly adding the same tenant on service restart. I've addressed it in the beta release.

One thing to note is that connecting to pooler is only supported via the connecting string atm. For eg.

psql 'postgresql://postgres.pooler-dev:postgres@localhost:54329/postgres'
bombillazo commented 4 months ago

Hey @sweatybridge , we're still having this issue when we upgrade the CLI to the latest version while working in a Ubuntu environment.

I notice the pgbouncer docker image is not downloaded while in my Mac OS environment we do have it.

sweatybridge commented 4 months ago

Hmm we deprecated the pgbouncer image recently in favour of supavisor. So it sounds like your macOS environment is outdated?

Regarding the Ubuntu issue, I tried reproducing on gitpod.io but it started and restarted without issues. I tried both with and without stop --no-backup flag.

Could you confirm if the error log is identical? If so, is it also reproducible on gitpod/codespaces?

bombillazo commented 4 months ago

@sweatybridge These are the image versions Im running:

image

but I still get this error message from the supavisor contianer:

2024-07-02 12:26:23 16:26:23.575 request_id=F95yHqDXtgXjlj0AAANh region=local [info] Sent 204 in 233µs
2024-07-02 12:26:33 16:26:33.633 request_id=F95yIPhVhtPfxGcAAAOB region=local [info] HEAD /api/health
2024-07-02 12:26:33 16:26:33.633 request_id=F95yIPhVhtPfxGcAAAOB region=local [info] Sent 204 in 266µs
2024-07-02 12:26:39 16:26:39.435 region=local [error] ClientHandler: User not found: "Either external_id or sni_hostname must be provided" {:single, "supabase_admin", nil}
2024-07-02 12:26:39 16:26:39.435 region=local [warning] client_join is called with a mismatched id: nil
2024-07-02 12:26:39 16:26:39.435 region=local [warning] ClientHandler: socket closed with reason {:shutdown, :user_not_found}
2024-07-02 12:26:43 16:26:43.679 request_id=F95yI08fSrHznlkAAAOh region=local [info] HEAD /api/health
2024-07-02 12:26:43 16:26:43.679 request_id=F95yI08fSrHznlkAAAOh region=local [info] Sent 204 in 242µs
bombillazo commented 4 months ago

@sweatybridge should we use port 54329 as your example connection string? we've always used 6543 locally

sweatybridge commented 4 months ago

should we use port 54329 as your example connection string? we've always used 6543 locally

It depends on whether your client is connecting from inside or outside containers. If you are connecting from inside edge runtime container, then you can use pooler:5432 for session mode and pooler:6543 for transaction.

But if you are connecting from outside, like running psql from host OS, then you want to use localhost:54329. I believe it's transaction mode by default but can be changed in config.toml.

sweatybridge commented 4 months ago

Also note the special username of postgres.pooler-dev used in my example connection string. The part after . is the static tenant id required by supavisor to look up in its tenants table.

I believe the new error you reported is due to client connecting with a partial username that's missing the tenant id portion.

2024-07-02 12:26:39 16:26:39.435 region=local [error] ClientHandler: User not found: "Either external_id or sni_hostname must be provided" {:single, "supabase_admin", nil}
bombillazo commented 4 months ago

@sweatybridge Here are the logs from the pooler container as soon as it starts and i try to use the pooler connection:

2024-07-02 13:16:16 Setting RLIMIT_NOFILE to 100000
2024-07-02 13:16:17 17:16:17.294 [info] Migrations already up
2024-07-02 13:16:17 17:16:17.566 [notice]     :alarm_handler: {:set, {:system_memory_high_watermark, []}}
2024-07-02 13:16:17 17:16:17.637 [notice] [description: 'Partisan node name generated and configured', disterl_enabled: false, name: :"cb2745da-3896-11ef-85cc-617ff023eab8@875d488d92d2"]
2024-07-02 13:16:17 17:16:17.643 [notice] [addr: {0, 0, 0, 0, 0, 0, 0, 1}, description: 'Resolved IP address for host', family: :inet6, host: '875d488d92d2']
2024-07-02 13:16:17 17:16:17.654 [info] [description: 'Not using container orchestration; disabling.']
2024-07-02 13:16:17 17:16:17.732 [info] [description: 'Starting Partisan listener', listen_addrs: [%{ip: {127, 0, 0, 1}, port: 20100}]]
2024-07-02 13:16:18 17:16:18.372 region=local [info] Elixir.Supavisor.SignalHandler is being initialized...
2024-07-02 13:16:18 17:16:18.387 region=local [warning] Proxy started transaction on port 6543, result: {:ok, #PID<0.396.0>}
2024-07-02 13:16:18 17:16:18.388 region=local [warning] Proxy started session on port 5432, result: {:ok, #PID<0.499.0>}
2024-07-02 13:16:18 17:16:18.391 region=local [notice] SYN[nonode@nohost] Adding node to scope <tenants>
2024-07-02 13:16:18 17:16:18.398 region=local [notice] SYN[nonode@nohost] Creating tables for scope <tenants>
2024-07-02 13:16:18 17:16:18.404 region=local [notice] SYN[nonode@nohost|registry<tenants>] Discovering the cluster
2024-07-02 13:16:18 17:16:18.406 region=local [notice] SYN[nonode@nohost|pg<tenants>] Discovering the cluster
2024-07-02 13:16:18 17:16:18.595 region=local [info] Running SupavisorWeb.Endpoint with cowboy 2.10.0 at 0.0.0.0:4000 (http)
2024-07-02 13:16:18 17:16:18.598 region=local [info] Access SupavisorWeb.Endpoint at http://localhost:4000
2024-07-02 13:16:19 17:16:19.680 [notice] [description: 'Partisan node name configured', disterl_enabled: false, name: :supavisor@875d488d92d2]
2024-07-02 13:16:19 17:16:19.681 [notice]     :alarm_handler: {:set, {:system_memory_high_watermark, []}}
2024-07-02 13:16:19 17:16:19.682 [notice] [addr: {0, 0, 0, 0, 0, 0, 0, 1}, description: 'Resolved IP address for host', family: :inet6, host: '875d488d92d2']
2024-07-02 13:16:19 17:16:19.683 [info] [description: 'Not using container orchestration; disabling.']
2024-07-02 13:16:19 17:16:19.684 [info] [description: 'Starting Partisan listener', listen_addrs: [%{ip: {127, 0, 0, 1}, port: 20100}]]
2024-07-02 13:16:19 17:16:19.686 region=local [info] Elixir.Supavisor.SignalHandler is being initialized...
2024-07-02 13:16:19 17:16:19.687 region=local [warning] Proxy started transaction on port 6543, result: {:ok, #PID<0.2766.0>}
2024-07-02 13:16:19 17:16:19.688 region=local [warning] Proxy started session on port 5432, result: {:ok, #PID<0.2869.0>}
2024-07-02 13:16:19 17:16:19.688 region=local [notice] SYN[supavisor@875d488d92d2] Adding node to scope <tenants>
2024-07-02 13:16:19 17:16:19.689 region=local [notice] SYN[supavisor@875d488d92d2] Creating tables for scope <tenants>
2024-07-02 13:16:19 17:16:19.689 region=local [notice] SYN[supavisor@875d488d92d2|registry<tenants>] Discovering the cluster
2024-07-02 13:16:19 17:16:19.689 region=local [notice] SYN[supavisor@875d488d92d2|pg<tenants>] Discovering the cluster
2024-07-02 13:16:19 17:16:19.699 region=local [info] Running SupavisorWeb.Endpoint with cowboy 2.10.0 at 0.0.0.0:4000 (http)
2024-07-02 13:16:19 17:16:19.706 region=local [info] Access SupavisorWeb.Endpoint at http://localhost:4000
2024-07-02 13:16:19 17:16:19.739 region=local [info] [libcluster:postgres] Connected to Postgres database
2024-07-02 13:16:18 [os_mon] memory supervisor port (memsup): Erlang has closed
2024-07-02 13:16:18 [os_mon] cpu supervisor port (cpu_sup): Erlang has closed
2024-07-02 13:16:26 17:16:26.843 request_id=F9502eGS1q040REAAAAH region=local [info] HEAD /api/health
2024-07-02 13:16:26 17:16:26.843 request_id=F9502eGS1q040REAAAAH region=local [info] Sent 204 in 125µs
2024-07-02 13:16:30 17:16:30.967 region=local [error] ClientHandler: User not found: "Either external_id or sni_hostname must be provided" {:single, "postgres", nil}
2024-07-02 13:16:30 17:16:30.968 region=local [warning] client_join is called with a mismatched id: nil
2024-07-02 13:16:30 17:16:30.968 region=local [warning] ClientHandler: socket closed with reason {:shutdown, :user_not_found}

I am on the latest CLI version, not sure why I still get the issue. How can I verify I am using the image with the fix you pushed?

bombillazo commented 4 months ago

@sweatybridge Ok got it, so what should the tenant ID be for local?

sweatybridge commented 4 months ago

Tenant id on local is hardcoded as pooler-dev. So the full username to pass to Postgres client is postgres.pooler-dev.

Which library are you using to connect? May be it has problem supporting this syntax.

Your docker container logs look ok, ie. health check is passing with 204. The error probably came from your app client connecting.

bombillazo commented 4 months ago

Thats it! pooler-dev

Im using pg driver to connect to the DB from our backend, but when we upgraded CLIs our env variables did not have the tenant appended to the DB user var. Thanks, now it works!

koljahuang commented 1 month ago

Tenant id on local is hardcoded as pooler-dev. So the full username to pass to Postgres client is postgres.pooler-dev.

Which library are you using to connect? May be it has problem supporting this syntax.

Your docker container logs look ok, ie. health check is passing with 204. The error probably came from your app client connecting.

I tried. It works now. Big thx.