ufront / ufront-orm

The Object Relational Mapper, allows easy, type-safe access to your database. Builds on Haxe's SPOD but adds macro powered relationships, validation and client side usage
MIT License
13 stars 4 forks source link

Feature Discussion: Database Migrations #12

Open jasononeil opened 9 years ago

jasononeil commented 9 years ago

The Aim

(Feel free to comment / edit...)

// How do we describe database structure, when the model might not exist at runtime?
// (For example, if we had a model, then rolled back to before it existed, how do we still have a migration that makes sense?)

typedef DBColumn = { name:String, type:sys.db.RecordType };
typedef DBIndex = { fields:Array<String>, unique:Bool };
typedef DBForeignKey = { fields:Array<String>, relatedTableName:String };
typedef DBTable = {
    tableName:String;
    fields:Array<DBColumn>;
    indices:Array<DBIndex>;
    foreignKeys:Array<DBForeignKey>;
};

// What are all the possible actions we'd want to take on our database?

enum MigrationAction {
    CreateTable( name:String, initialFields:Array<DBColumn>, indices:Array<DBIndex> );
    AddField( tableName:String, field:DBColumn );
    ModifyField( tableName:String, before:DBColumn, after:DBColumn );
    AddIndex( tableName:String, index:DBIndex );
    AddForeignKey( tableName:String, index:DBForeignKey );
    CreateJoinTable( modelAName:String, modelBName:String );
    InsertData( tableName:String, id:Null<Int>, properties:{} );
    RunScript( upHScript:String, downHScript:String );
    // And the reverse of all of those...
}
enum MigrationDirection {
    Up;
    Down;
}

// Now, at any given step, we might have several actions we want to do at once, a "migration".
// How do we store these migrations so they can go "up" if code is present, 
// and still go "down" if we're on an old version of the code where it doesn't exist.
// Basically, we need to be able to undo a migration, even if that migration does not exist in the current code base.
// So, when a migration runs "up", we'll store info about it in the database that allows us to run it "down" if we need to.

class Migration {
    // These are the actions for this migration
    var actions:SData<Array<MigrationAction>>;

    // And this is information about when we run the migration, which we save to the DB:
    var direction:MigrationDirection;
    var date:SDateTime;
    var id:String;

    function new( actions ) {
        this.actions = actions;
        this.id = Type.getClassName( Type.getClass(this) );
    }
    function runUp();
    function runDown();
}

// For example

class Migration2014121501_Create_User_Table extends Migration {
    public function new() super([
         CreateTable( "User", [...], [...] ),
         ModifyField( "User", ..., ... ),
         CreateTable( "Group", [...], [...] ),
         CreateJoinTable( "User", "Group" )
    ]);
}

var m = new Migration2014121501_Create_User_Table();
m.runUp();

// So, we have a way of creating a Migration class that has actions that can run up and down.
// How do we tell what needs to run up, and what needs to run down?
// Well, imagine we have this API:

typedef MigrationsToRun:Array<{ migration:Migration, dir:MigrationDirection }>;
class MigrationApi extends UFApi {
    function getMigrationsInCode():List<Migration>; // Get all migration classes that were compiled
    function getMigrationsInDB():List<Migration>; // Get all migrations that have been run "up" on the database
    function diffMigrations( inCode:List<Migration>, inDB:List<Migration> ):{ toRunUp:MigrationsToRun, toRunDown:MigrationsToRun }
    function runMigrations( migrations:Array<{ m:Migration, dir:MigrationDirection}> );
}

// So what if we need to do a migration that involves transforming data?
// Other migration systems you just write it in code, but it makes it difficult to "run down".
// Because, if you're on an old version of the code base, you don't have the migration to know what code to run when you're undoing the migration.
// A possible solution is to use hscript, and expose a limited API.

var splitNamesScript = "
for ( u in BasicManager.getAllObjects(User) ) {
    var names = u.name.split(" ");
    u.firstName = names.shift();
    u.lastName = names.pop();
    u.middleNames = names;
    u.save();
}
";
var joinNamesScript = "
for ( u in BasicManager.getAllObjects(User) ) {
    u.names = '${u.firstName} ${u.middleNames.join(" ")} ${u.lastName()}';
    u.save();
}
";
var actions = [
    AddField( "User", { name:"firstName", type:SString<20> } );
    AddField( "User", { name:"lastName", type:SString<20> } );
    AddField( "User", { name:"middleNames", type:SData<Array<String>> } );
    RunScript( splitNamesScript, joinNamesScript );
    RemoveField( "User", { name:"name", type:SString<255> } );
];

// Cool, that looks like it will mostly work - when running up it will create the new fields, run the "splitNamesScript" script, delete the old field.
// When running down, it will create the old field, run the "joinNamesScript" script, and delete the new fields.

//
//
// So.... how do we make this easy to maintain?
// I don't want to spend my days doing nothing but writing migrations.
//
//

// Let's have a description of the exact state of our DB schema.
typedef Schema = Array<DBTable>;

// And then, add to our MigrationAPI some methods to get a schema:
class MigrationAPI extends UFApi {
    schemaFromModels():Schema; // Based on the models in the code
    schemaFromDatabase():Schema; // Based on the migrations in the DB
    diffSchema( s1:Schema, s2:Schema ):Migration; // Get a migration
    runTempMigrations():Void; // Set up the tables automatically without creating a formal migration.
    undoTempMigrations():Void; // Undo the above step.
}

So in effect, the workflow would be like this:

  1. Create or modify models in your code, and compile.
  2. Use MigrationAPI.runTempMigrations() while you're working on it.
  3. Turn your temporary migrations into a real one. (undoTempMigrations(), diffSchema() to create new migration). If you want to change the specifics of the migration or do any custom data seeding / processing, go for it.
  4. Commit the new migration to version control.
  5. When someone else pulls down a new branch or release and it has new migrations, use runMigrations to get the DB in line...

Some questions:

  1. We'll need a tool to help with this. Does it run as part of the ufront CLI (haxelib run ufront migrate) or does it run through a UFTasks file that is compiled as part of the project. The tasks file makes sense because in compile time you'd have access to the models. It might be more bootstrapping required to get an initial setup though...
  2. Should the migrations be a ".hx" file, or should we make them a JSON file or something? As long as we commit them to source control, and they're available at runtime, it shouldn't matter.
  3. This is a fairly complicated model, but I think it will work for most use cases. Is there anything I'm missing? Will this complexity be hard to maintain in the future? Is their a simpler way to do it?
theRemix commented 9 years ago
// So what if we need to do a migration that involves transforming data?
// Other migration systems you just write it in code, but it makes it difficult to "run down".
// Because, if you're on an old version of the code base, you don't have the migration to know what code to run when you're undoing the migration.

Storing migration scripts/tasks in the database is clever, yet wouldn't be flexible and powerful enough i think.

An idea i had is, each migration that is run will store it's id in a schema_migrations table.
Before running any actions, check the ids in schema_migrations and compare against the Migration classes found in the codebase.
If there are any new Migration classes that the schema_migrations does not have, then it's safe to migrate up.
If there are any schema_migrations tracked that the code base does not have (due to rolling the codebase back a few commits), then like you mentioned, we cannot migrate down since those migration classes do not exist in this past version. So instead, you cannot migrate up or down, and would be advised to:
a) Checkout the commit that matches the latest tracked migration (probably master), then migrate down to the version you need(find a way to make this manageable, then (re)checkout the past commit. or b) haxelib run ufront migrate -resync
which will wipe the db, then migrate up to latest

jasononeil commented 9 years ago

That's a much better option. Just let them know which migrations need to be rolled back, and they can figure it out in version control.

Now to just code it.... ;)