supabase-community / postgres-new

In-browser Postgres sandbox with AI assistance
https://database.build
Apache License 2.0
2.48k stars 195 forks source link

Improve resilience of Live Share to psql -f #123

Open thenonameguy opened 4 weeks ago

thenonameguy commented 4 weeks ago

Bug report

While building a demo using the Live Share feature of database.build, I've seen many intermittent failures of the connection. Usually fixed by stopping the Live Share, reloading the tab and restarting the Live Share.

Describe the bug

While running many statements (via psql -f) against a Live Share'd non-trivial DB the server/proxy regularly loses sync, while seemingly hallucinating gigabyte-sized messsage sizes (1853122928 bytes or 1667196005 bytes as examples).

psql:schemamap_init.sql:462: lost synchronization with server: got message type "e", length 1667196005
psql:schemamap_init.sql:462: error: connection to server was lost

To Reproduce

Steps to reproduce the behavior, please provide code snippets or a repository:

  1. Open database.build
  2. Run prompt: Create a well-modeled ERP schema, with multi-tenancy, constraints on most columns (especially tenant_id) and mock data.
  3. Enable Live Share, copy URI
  4. Download schemamap_init.txt (sql script)
  5. Run: psql -f ./schemamap_init.txt "$URI"
  6. Expectation: it should succeed, although there are sometimes failures here as well
  7. Try rerunning step 5 a few times, eventually you will a get a "lost synchronization".

Expected behavior

The client does not get out of sync.

System information

Additional context

I'm aware the client is not major 16 version, but for these simple statement executions, it should not matter.

Another invocation failed later:

psql:schemamap_init.txt:1361: lost synchronization with server: got message type "m", length 1701737504
psql:schemamap_init.txt:1361: error: connection to server was lost

I haven't been able to reproduce the bug on Google Chrome 130.0.6723.92 (Official Build) (arm64)

gregnr commented 3 weeks ago

Thanks for reporting @thenonameguy, we'll dig into this. cc @jgoux

gregnr commented 3 weeks ago

@thenonameguy by chance do you have the original migrations create by AI prior to using Live Share to help us closer reproduce this?

thenonameguy commented 3 weeks ago

I made something even better. Followed the repro steps and stopped live sharing after receiving:

Failed to install SDK: psql:<stdin>:17: lost synchronization with server: got message type "o", length 1920229740
psql:<stdin>:17: error: connection to server was lost

Then exported the state of the PGLite: erp-schema-with-multi-tenancy-1730479427654.gz

For completeness, here are the migrations:

-- Migrations will appear here as you chat with AI

create table tenants (
  id bigint primary key generated always as identity,
  name text not null,
  created_at timestamptz not null default now()
);

create table users (
  id bigint primary key generated always as identity,
  tenant_id bigint not null,
  username text not null unique,
  email text not null unique,
  password text not null,
  created_at timestamptz not null default now(),
  foreign key (tenant_id) references tenants (id) on delete cascade
);

create table products (
  id bigint primary key generated always as identity,
  tenant_id bigint not null,
  name text not null,
  description text,
  price numeric(10, 2) not null check (price > 0),
  created_at timestamptz not null default now(),
  foreign key (tenant_id) references tenants (id) on delete cascade
);

create table orders (
  id bigint primary key generated always as identity,
  tenant_id bigint not null,
  user_id bigint not null,
  total numeric(10, 2) not null check (total >= 0),
  status text not null check (status in ('pending', 'completed', 'cancelled')),
  created_at timestamptz not null default now(),
  foreign key (tenant_id) references tenants (id) on delete cascade,
  foreign key (user_id) references users (id) on delete set null
);

create table order_items (
  id bigint primary key generated always as identity,
  order_id bigint not null,
  product_id bigint not null,
  quantity int not null check (quantity > 0),
  price numeric(10, 2) not null check (price > 0),
  foreign key (order_id) references orders (id) on delete cascade,
  foreign key (product_id) references products (id) on delete cascade
);
jgoux commented 3 weeks ago

Hello @thenonameguy and thanks for your detailed report!

Unfortunately I wasn't able to reproduce the issue on my end. I tried in Firefox using your initial migration file then enabling Live Sharing and executing your schemamap_init.sql file using psql v15.8. I did like ~20 runs of it and they all succeeded.

Let's wait to see if there are other reports. 👍

Also, we just released the ability to drop .sql files directly into the chat, so you can now run your script directly against your in-browser database without going through Live Share! 🎊

thenonameguy commented 3 weeks ago

Thanks for looking into it @jgoux! My current hunch is that this might be related to the Websocket proxy.

I've had some friends/ex-colleagues try out the demo flow with Live Share in different countries/web browsers, and the issues popped up the same.

The demo is available here, after clicking "Yes". https://app.schemamap.io/demo

Since I'm using a single VPS to connect to the Websocket proxy, might it be related to hitting some form of rate limits?

I've been also getting portal "p2" does not exist errors as well since.

I'm adding metric & trace collection to connection failures so I can provide a bit more detail on the ticket, please don't close it just yet. Cheers!