stripe / pg-schema-diff

Go library for diffing Postgres schemas and generating SQL migrations
MIT License
278 stars 20 forks source link

Investigate supabase CLI integration #85

Closed Navbryce closed 5 months ago

Navbryce commented 6 months ago

Evaluate if it's ease of integrating with the supabase CLI

sweatybridge commented 5 months ago

Heyy, I came across this repo when it was shared internally at Supabase. I think what you are doing here is really cool. If there's anything I can help with, please feel free to give us a shout :)

bplunkett-stripe commented 5 months ago

Definitely! I think the biggest feature I want to have is support for is non-public schemas before I attempt to integrate this with your folks' CLI. I'm not sure if you have any strong thoughts there?

Overall, any help with integrating this into the supabase CLI would be great. Not sure if you have any strong prefs for how to do it.

-- I also realized I posted this initial issue on the wrong github account, oops!

sweatybridge commented 5 months ago

I want to have is support for is non-public schemas before I attempt to integrate this with your folks' CLI

That's great to hear! I think it's not necessarily a blocker for integration.

Supabase CLI supports --schema flag for diffing specific schemas. We can introduce a new flag --with-pg-schema-diff that is required together with --schema public. This allows users to invoke your tool and provide feedback as you build out support for other schemas.

Overall, any help with integrating this into the supabase CLI would be great. Not sure if you have any strong prefs for how to do it.

We run existing tools like migra and pgadmin-diff in docker containers mainly because they are not written in go. I hope Supabase CLI can import this package natively so we don't need to rely on docker for diffing.

I will probably give it a shot next week but if you see any immediate blocker please let me know.

bplunkett-stripe commented 5 months ago

Ahh that's perfect. As an initial integration, it makes sense to just error out if folks don't have the schema filter set to public.

I can't think of any immediate blockers. I'm excited to see the PR! Let me know if there's anyway I can help.

sweatybridge commented 5 months ago

I briefly explored the integration in my linked PR. There are a few questions that would need your help to clarify.

  1. Is there any plans to support diffing database by their connection string?

Ideally we also want to support users who make schema changes through our dashboard GUI and diff those changes to save as a migration file. Currently I'm implementing the tempdb.Factory interface to create a database from existing schema file but I'm not sure how it would work if we want to use the current schema of the database instead of recreating from an array of DDL statements. Perhaps we can expose schema.go in pkg instead so that users can call GetPublicSchema directly?

  1. I realise some of the important entities, like RLS policies, are not included in the diff. What is the process and timeline for requesting these entities to be added?

  2. A minor issue I faced when calling GeneratePlan method is that planOptions is declared as a private type. Can we make it public so that I can override some of the options?

Overall, I think this tool is very promising. The end goal for Supabase CLI is to converge on a single diff tool that just works. So we are happy to collaborate and find a path forward.

bplunkett-stripe commented 5 months ago
  1. Yes, we can add support for diffing between databases. It should be pretty trivial to implement an initial implementation of that. I don't want to expose GetPublicSchema because the underlying structs of the schema object are volatile, and I don't want folks to build dependencies on them.
  2. I imagine things like RLS policies will be pretty straight-forward to add. What other entities are you thinking? Materialized views will certainly be trickier but doable.
  3. What options are you trying to modify? The options struct is intentionally not-exported, since modification of options is supposed to be constrained to the With{X} options. If you're looking at the schema getter options, those are not supported yet because we still need to update the SQL generation code to support multiple schemas. We can now fetch multiple schemas but the SQL generation will still fall apart with multiple schemas.
bplunkett-stripe commented 5 months ago

As a follow up to the above, some things I think would be good to establish:

Sorry for all the questions! Just want to make sure we're on the same page!

sweatybridge commented 5 months ago

The options struct is intentionally not-exported, since modification of options is supposed to be constrained to the With{X} options.

Thanks for the tip. I was looking for ways to disable validation and WithDoNotValidatePlan is working like a charm.

we can add support for diffing between databases.

Also thanks for the quick turnaround on diffing between databases. I think that was a major blocker for exposing to CLI users because our current convention uses their local database as source of truth.

Do you want all of the DDL to be able to be run in a transaction?

I don't think this is a problem for us because we split each migration file into individual statements and run them in an implicit transaction using pgx.Batch. The implicit transaction is a feature of extended PostgreSQL protocol which doesn't error with concurrent index build. I verified this by running db reset with a diff file containing CREATE UNIQUE INDEX CONCURRENTLY.

But in the worst case, if any DDL statement can't be used in the same transaction as others, users always have the option to move it to a new file independently. This might be one of the benefits of using timestamped migration files.

What schema entities would you like to be diffable before you feel comfortable exposing to users (by priority)? What schema entities would you like to be supported in an ideal world?

In terms of priority, RLS policies and custom enum types are probably the highest on my list.

I haven't had the time to test other pg entities exhaustively but I find the unit tests created by migra/schemainspect to be a good starting place. There are other areas of migra that leaves more to be desired but we can probably discuss those in a separate issue.

bplunkett-stripe commented 5 months ago

Yes, I've discovered migra has poor support of partitions, which is where this library partially comes into play.

A lot of our DDL is non-transactional to optimize minimizing locks, i.e., when a col is switch to NOT NULL, we build a invalid check constraint, validate it, mark the column as not null, and then drop the check constraint.

I think the enums and RLS policies should be pretty straight forward, since it's not as much of a schema entity dependency problem 🤞 .

I should have public schema support up by EoW. It's basically done; I just need to build tests (and confidence). Diffing DB's is now a matter of implementing SchemaSource for a sql.DB, which should be pretty straightforward. I can probably knock out RLS policies next week and maybe enums.

This project is not my primary focus, so I cannot guarantee any timelines here unfortunately 😆 .

sweatybridge commented 5 months ago

I cannot guarantee any timelines here unfortunately

That's totally fine with us. Thank you for all the efforts you've put into this open source project.

bplunkett-stripe commented 5 months ago

Multi-schema support is in. I'll try to get the DatabaseSchemaSource in today, and then I can work on rolling out the enum + RLS policy diffing.

bplunkett-stripe commented 5 months ago

Having a database as a schema source is now supported. This should unblock the actual integration.

In terms of next steps, I'm going to:

  1. Update the CLI to support non-public schemas, schema filters, and comparing two databases directly. I realize this isn't on your desired feature set, but it's important that the CLI doesn't become too out of date from the features the library supports.
  2. Add row-level security support
  3. Add enum support
bplunkett-stripe commented 5 months ago

@sweatybridge , would you say that the lack of RLS and enum support is blocking your PR to add pg-schema-diff support?

sweatybridge commented 5 months ago

Not necessarily blocking. I can add a warning message after generating diff so that users are aware of the unsupported entities.

Let me update my PR and hopefully merge it before end of this week.

bplunkett-stripe commented 5 months ago

@sweatybridge, sounds like a plan! I'm really just trying to assess priority of features for you folks!

If you want your integration to be fancy, you could use pg_dump to check the migration actually matches what your user expects (assuming you have access to that binary).

TempDB -> Apply old schema -> Apply migration -> pg_dump . TempDB -> Apply new schema -> pg_dump. Make sure they match! Could be useful as validation for any of your migration generation integrations (migra, pgadmin)

sweatybridge commented 5 months ago

I've shipped the --use-pg-schema flag to CLI beta channel with a warning message. Your new changes have made my integration work a lot easier. Kudos to you!

bplunkett-stripe commented 5 months ago

Amazing! Thanks @sweatybridge! I'm going to close out this ticket, but feel free to open any tickets for feature requests/etc. I'll cut tickets for enums and RLS policies.

Also feel free to cut any tickets/start any discussions if there's something else you folks are looking for! I think supabase will be a great way to battle-test this library and find its shortcomings/sharp-edges.