rust-db / refinery

Powerful SQL migration toolkit for Rust.
MIT License
1.35k stars 126 forks source link

Migration enums #312

Closed superstator closed 8 months ago

superstator commented 9 months ago

I started on this a while back and then shelved it for a while, so I still have some work to do cleaning up tests etc. Opening as a draft for discussion and if we like the approach I can polish it up a bit. See #303.

Mainly all this does is add a couple new elements to the code emitted by embed_migrations, for example:

    #[repr(i32)]
    pub enum EmbeddedMigration { 
        AddBrandToCarsTable(Migration) = 3i32, 
        AddYearToMotosTable(Migration) = 4i32, 
        AddCarsAndMotosTable(Migration) = 2i32, 
        Initial(Migration) = 1i32 
    }

    impl From<Migration> for EmbeddedMigration {
        fn from(migration: Migration) -> Self {
            match migration.version() as i32 {
                3i32 => Self::AddBrandToCarsTable(migration),
                4i32 => Self::AddYearToMotosTable(migration),
                2i32 => Self::AddCarsAndMotosTable(migration),
                1i32 => Self::Initial(migration),
                v => panic(\"Invalid migration version '{}'\" , v)
            }
        }
    }

I also reorganized the regex used to match migration files so they can be reused from the macro logic.

jxs commented 9 months ago

Hi, and thanks for this! Looking into the example code now I am wondering, have you tested the compile time pre and after that change? It's not uncommon for users to have 1000+ migrations, and looking into experiences people had with a very big number of variants is not encouraging, also because on our case we are alo generating the variants.

superstator commented 9 months ago

Interesting question! I'll play around with it and let you know what I see. Worst case this would be pretty easy to gate behind a feature flag.

superstator commented 9 months ago

Measuring it on my machine at least it's a fairly minor impact. Without enums being generated, my test project with 20 migrations takes about 6s seconds to build after a clean, with 200 it still takes about 6s, with 2,000 it takes about 12s, and with 20,000 it takes over 15 minutes 😬. With enums enabled the 2,000 migration case is maybe 1s slower, and the 20,000 migration case is about 30 seconds slower.

That said I put a feature flag around it anyway, so it's not adding any overhead where it's not needed. By way of documenting I updated the example project and added it to the workspace so it can actually be built and run with or with enums/iteration/etc.

jxs commented 8 months ago

Thanks for running the experiment. You experimented with both with sql and rs migrations? Btw can you share the migrations code?

superstator commented 8 months ago

I just did .sql migrations originally, but trying it now with a mix of both I see similar results. The migrations I used are very minimal, just inserting rows in a table. Here's the script I used to generate them in bulk:

#!/usr/bin/env bash

mapfile -t names < <(grep -E '^[a-z]{3,}$' /usr/share/dict/words | sort -Ru | xargs -n 3 echo | tr ' ' '_')

rs_frac=2

for i in $(seq "$1" "$2"); do
  name=${names[$i]}
  if [ $(($RANDOM % $rs_frac)) -eq 0 ]; then
    cat > "V${i}__t_${name}.rs" <<- HERE
use barrel::{types, Migration};

use crate::Sql;

pub fn migration() -> String {
    let mut m = Migration::new();

    m.inject_custom("insert into migrations (name, id) values ('$name', $i)");

    m.make::<Sql>()
}
HERE
  else
    cat > "V${i}__t_${name}.sql" <<- HERE
insert into migrations (name, id) values ('$name', $i);
HERE
  fi
done

Would be easy enough to make it create a bunch of tables rather than inserting rows, I just didn't want to stress sqlite out with 1,000s of tables if I actually run them. I can send you the whole test project if you're curious, I'm tempted to script it out better so I can actually chart the results and see when it really breaks down, with or without enums.

superstator commented 8 months ago

build_times

I think the takeaway is it starts getting painful as soon as you have more than a few thousand migrations, and there will be a practical limit eventually. Worst case, if you can just use debug builds and only build release in a CI workflow you can manage more, but it might be prudent to keep large sets of migrations in a separate crate. Enums add a little overhead to the whole process, not a ton, but still worth putting behind a flag I think.

Experiment source: https://github.com/superstator/refinedb

jxs commented 8 months ago

Thanks for this! And sorry for not having yet reviewed, will try do so today

jxs commented 8 months ago

Thanks!