Migrations without ormlite CLI

kurtbuilds / ormlite

An ORM in Rust for developers that love SQL.

https://crates.io/crates/ormlite

MIT License

216 stars 11 forks source link

Migrations without ormlite CLI #39

Closed philipcristiano closed 7 months ago

philipcristiano commented 7 months ago

I'd like to build migrations into my applications to be determined at runtime instead of shipping migration steps. Is it possible to generate the migrations as a user of the library without the ormlite CLI?

There appears to be a schema_from_filepaths function for migrations with the CLI but for the dynamic case it seems like a schema_from_models function would avoid needing to ship/parse source files.

kurtbuilds commented 7 months ago

Can you clarify your question here, perhaps with examples of what the user would do?

Generally speaking, Rust does not have runtime reflection, so you can't detect what types exist except by source code analysis (or explicitly registering information using something like ctor/inventory crate).

philipcristiano commented 7 months ago

I'd specifically hope to combine the sqlmo example with the models from ormlite so that the sqlmo example (copied):

async fn main() {
    let url = std::env::var("DATABASE_URL").unwrap();
    let mut conn = sqlx::postgres::PgConnection::connect(&url).await?;
    let current = Schema::try_from_postgres(&mut conn, schema_name).await?;
    let end_state = Schema::default(); // Load your end-state by manually defining it, or building it from another source
    let migration = current.migrate_to(end_state, &sqlmo::Options::default());

    for statement in migration.statements {
        let statement = statement.to_sql(Dialect::Postgres);
        println!("{}", statement);
    }
}

Replacing

let end_state = Schema::default();

with something like

let end_state = ormlite::schema();

In my CLI I would then have commands migrate and show-migration that would execute or print out the changes required.

The lack of runtime reflection complicates this pattern compared to other languages but the introspection crate is a bit down this path of collect field information via macro then expose a function to access it.

The overall goal is to move closer to solving divergence in database schemas between dev, staging, and production. Where each may have had different changes applied at different times and where some incident-fighting may have had manual changes in production that aren't reflected in the committed migration files.

kurtbuilds commented 7 months ago

ormlite does has a trait TableMeta which exposes at runtime various attributes about the table schema (which I suppose is similar to the introspection crate you mention).

Using it, you might be able to build a sqlmo::Schema at runtime, and then calculate migrations using the example you reference.

I guess where I'm confused is... if it's at runtime, then it's no longer based on the structs, and you can construct the schema however you want. For example, I have a project that calculates migrations based on OpenAPI specs, which can be provided at runtime.

If you're trying to calculate from the ormlite struct, then you might as well calculate the migrations at compile time.

I might still not understand what you're trying to achieve.

philipcristiano commented 7 months ago

I'll have to take a look at TableMeta then!

I'm trying to achieve a situation where the source-of-truth for data representation exists in the source code as structs but the mechanism to bring a database to a compatible schema happens at runtime. Generating an intermediate representation at compile time makes sense, as well as even a SQL-version of the target schema that can be (re)viewed.

For why calculating migrations-at-runtime, I've run into problems historically/continually/in current projects/teams:

If migrations are checked into the repository they were typically generated from a developers database. That database frequently does not look like the production database.
Our build systems are not able to connect to production or staging databases to generate a migration that would apply to the production schema as it exists in production. This can be solved with schema dumps and loading them to other systems but there are race conditions in that and doesn't solve the next item.
In a particular case I had 16 independent databases that "should have" the same schema, but in reality did not always. Generating a migration based on a single DB server may not work for all of them.

All of these issues can be solved by "being more careful" but that has not scaled well. I'm exploring how to solve this in a way closer to infrastructure-as-code "convergence" where the desired state is defined and the tooling gets there, like SQL declaritive-ness!

kurtbuilds commented 7 months ago

Interesting. Ok the use case makes a lot more sense.

If you want to achieve what you're asking for, I think the easiest way with current tools would be commit and ship in deployment a serde (e.g. json) file of the code's representation of the db schema. At runtime, connect to the prod db, extract the schema, calculate the migrations by diffing the two schemas, and execute. As mentioned, I do have a system doing something similar for a system that writes to SQL objects coming from an OpenAPI data source.

That said - I don't think it's feasible to generate migrations in the general case without engineer review.

There are problems that are undecidable just from looking at code. If you add one column and delete another with the same type, is that a rename or an insertion and deletion? If you add a non null column, what is the value for existing rows?

For your situation, I wonder if you could have a pre-merge or pre-deploy step that runs migrations in a transaction to test them, and rolls back, before the actual deploy.

Another idea, and this would be pretty easy to achieve with the current building blocks of sqlmo, is - at runtime, you could compare the prod and dev db. If they don't line up, it would refuse to generate a migration.

philipcristiano commented 7 months ago

I'm glad it makes sense at least!

There are problems that are undecidable just from looking at code. If you add one column and delete another with the same type, is that a rename or an insertion and deletion? If you add a non null column, what is the value for existing rows?

For these cases a single-step migration runs into problems as well where I'm not particularly concerned that these may not automatically have answers and as such require an out-of-band solution for the migration step.

Renames are difficult with multiple versions of the code and require multiple deploys to add a compatible version, update the db, then remove the outdated code. Backfilling existing values for a not-null constraint we also handle separately from deploy or migration as this can generate too heavy a write load if relying on the db to fill in the defaults for existing rows.

I don't think it's feasible to generate migrations in the general case without engineer review.

We do require review before applying migrations. For the size of our DB any migration can be problematic. Our pattern so far is that "apply migrations" always require manual approval. The logic is ~"if DB diff exists -> await manual approval". If approved the change could be applied as-is, or require a DBA to solve the problem a differently.

The simple-cases work well in dev/staging to help with developer velocity. Since DBA involvement can also lead to schema divergence from the app, having the app check helps point out where we need to make changes.

Thank you for taking the time to understand my problem and your work on ormlite/sqlmo, it's nice to see a project starting on generating migrations for Rust! I'll have to see what tradeoffs make the most sense in our case re:this vs Atlas/sqlx