Migrations - Githubissues

abondar commented 6 years ago

(edited by @grigi ) Migrations is currently planned as a post-v1.0 feature.

Some useful resources for getting this working right now:

https://github.com/tortoise/tortoise-orm/issues/8#issuecomment-534946871 (Native python, MySQL)
https://github.com/tortoise/tortoise-orm/issues/8#issuecomment-575982472 (Go, MySQL/PostgreSQL/SQLite)

Forward migrations

Best guess at this time for a complete solution, not a "quick" solution:

[ ] Make the describe_model() contain all the data one needs to generate DDL from.
[ ] Generate a sequence of high-level DDL instructions from describe_model()
[ ] Port generate_schema to use this high-level DDL instructions to generate a schema in the requested dialect. (This would require some decoupling from the driver instance, look at #72)
[ ] Build a persistance model (read/write) for high-level DDL insctuctions much like Django Migrations
[ ] And need a way to run "management" commands. So possibly a standard command-line utility.
[ ] Build a full model (e.g. same format as describe_model())from a series of high-level DDL instructions (as in read the migration persistance model)
[ ] Diff models, and generate a diff DDL instruction set.
[ ] If a Diff requires a default not pre-specified, we need to ask the user for something.
[ ] Have some way of determining the "version" of the models persisted in the DB.
[ ] Put it all together and make it work

I'm not particularily happy about many migration systems storing state in the DB itself, as for some environments that won't work, but it is a very good place from a consistency PoV. We should have a backup for when we generate DDL that one would pass on to the DB team in cases where DDL changes have an established process (e.g. enterprise/consulting)

Data migrations

Proper data migration management makes this very useful:

[ ] Allow custom user scripts as migrations
[ ] Allow data-migrations to get a version of a model as it was at that time. So ability to build a model class from the intermediate describe_model() data
[ ] Handle DB edge cases, like Postgres being able to do DDL changes in a transaction OR data changes in a transaction, but not both.

Backward migrations

There is some cases where backwards migrations cannot be done as a forward migration clobbered/destroyed some data. In cases where we detect that we need mising data, and request a sane default. What do we do about data migrations? Is it safe to just emit a warning?

Marlysson commented 6 years ago

Which models did you think it's necessary or which estructure design this feature need..

abondar commented 6 years ago

Currently I haven't invested much time into this matter, so I haven't came up with design at the moment. I am afraid that in this task we can't get away with just using some existing tool, because there are none appropriate on horizon.

I think taking a look at Django migrations would be good reference for the start. Other existing migrations engines are either not good enough or locked on some certain db.

One of my main concerns is that system should support running python code for data migrations along with schema migrations. May be it shouldn't be part of early implementation, but it should be designed in way that is ready for it.

grigi commented 6 years ago

Considering it took Aeracode essentially three tries before he got Migrations for Django to work nicely, I'd assume that this is a very complex problem. I also think the Django Migrations are one of the more sane implementations of it that I know of.

The core features that I use are:

Comprehensive schema in Migrations
Automatic Migration building
Ability to easily run arb code in a migration
Ability to get a Model for an old version as it is at THAT time (so your custom code doesn't fail weirdly later down the line)
Cross-DB support

A big help to Django Migrations, is that they are tightly coupled to the Django Management Interface, which auto discovers Models. In Tortoise we need an explicit import, and explicit Schema-creation command. I feel we can only attempt this once we have the ability to manipulate state from outside of your python code.

abondar commented 6 years ago

Yea, today I dived a little into django migrations and it seems to be one hell of a task to handle. But that is what makes it exciting :)

Most likely expanding apps conception to have autodiscover and more control over the models outside of python code is good idea by itself, disregarding migrations implementation. At the moment migrations seems to be too faraway dream to really concentrate on them now and we should first make tortoise good enough to use on it's own, hoping that users will be able to handle schema migrations by themselfs.

Summarizing I think we should it take it this way:

Come up with better apps module idea, that will not restrict users to have only certain project structure, so tortoise will stay easily pluggable into any async project, but will give ability to work with models more magically
Make autodiscover of models based on this apps concept.
Implement basic migration tool so users will stay within scope of ORM. I think it should look like system where user can manually create migration which will consist of forward and backward methods, where user can write pure SQL and write some basic python code to migrate it's data (without counting on ORM, because implementing models versioning seems to be hard). It will cover basic needs of users for migrations and give some time to think of something better.

May be there is a way to make dummy implementation of model versioning through pickling models at the moment of creating migration, but this seems to be really strange way to solve the problem :)

P.S. Stumbled across this roadmap on aerocode blog - https://www.aeracode.org/2018/06/04/django-async-roadmap/ - django seems to be taking interesting way :)

grigi commented 6 years ago

Interesting roadmap. It seems that roadmap is 2-3 years out for a fully async implementation, but they plan to do quick wins first.

I have been thinking on a way to do a way to manager apps/modules, and I have an idea which I'll document in a new issue (as a discussion)

imbolc commented 6 years ago

Maybe just write an translator to djangos models and use their migrations?

abondar commented 6 years ago

That thought flashed through my mind, but I really don't like the idea of making django dependency or copy pasting all migrations code. I pretty sure it's not as easy as it sounds, because django is quite world in itself and all modules a little bit coupled together. May be I will investigate into this matter and if it's turns out to be easy deal than may be implementing such transpiler could be done in separate project which will be pluggable into tortoise

grigi commented 6 years ago

I really think this should be Post-1.0 feature. Migrations are not easy, so we should not rush it.

vishalhkd commented 6 years ago

Any thoughts on integrating with Alembic project for migrations? There are many projects that use Alembic for managing migrations and SQLAlchemy as the ORM. Integrating with Alembic would help those projects move to Tortoise ORM for the async io benefits with ease. Just thinking out loud here ...

abondar commented 6 years ago

The thing is, as far as I see - Alembic it tightly coupled with sqlalchemy and giving it to users of Tortoise would be downright confusing. Making some tool that hides Alembic implementation under our own interface would work better, but it's still would be hard to make much progress with such approach, cause we would be limited to what Alembic could offer and it's still under active development.

Migrations is the feature that I really want to address, but I don't want to make fast and bad solution here. May be after releasing 1.0 I will complement code for migrations that I use in my projects and publish in some kind of way. It will be far fetch from real migration system, but give some basic features for people who struggle with it and reference for developing better system

grigi commented 6 years ago

So, after reading the docs of Alembic, I agree that it appears very tightly coupled to SQLAlchemy, and also, not tightly enough. One has to manage a lot of boilerplate.

There is some interesting design decisions that differ quite a bit from how Django does it: 1) Django keeps a log, Alembic does a diff 2) Django migration generation is more thorough as it uses data from the defined models. 3) I like how Alembic allows one to generate SQL for DDL commands, this way we can get separate DBAs to manage that. 4) Alembic has a more complex and capable DAG.

We will definitely look at both it and Django migrations and see what parts we can use?

If you want to get started on this (even if it is requirements gathering) you are welcome to @vishalhkd

Whilst Andrey wants to tackle Migrations after v1, I would instead like to tackle performance after v1.

danieldaeschle commented 5 years ago

Any suggestions how to make migrations at the moment? Manually?

grigi commented 5 years ago

Not as of yet @danieldaeschle I think Manually might be the best option.

Can you think of an intermediate system that would simplify the work of doing a migration manually?

wallneradam commented 5 years ago

I've written a table schema SQL synchronizer. Feel free to use it. Now it only works for MySQL/MariaDB, but in my opinion, it would be not too hard to generalize for other DBs. https://github.com/wallneradam/python_sql_schema_sync

I use it with Tortoise and it works wonderfully. It would be cool to integrate it into.

How I use now...

import schema_sync

# Get MySQL client
client = tortoise.Tortoise.get_connection('default')
# Get current table schemas
tables = await client.execute_query(
    "SELECT DISTINCT TABLE_NAME FROM `information_schema`.`tables` WHERE TABLE_SCHEMA = %s", db)
old_schema = ""
for table in tables:
    old_schema += (await client.execute_query(f"SHOW CREATE TABLE `{table['TABLE_NAME']}`"))[0]['Create Table']
    old_schema += ";\n"
# Get the new schemas
new_schema = tortoise.utils.get_schema_sql(client, safe=False)
# Get queries to update schema
update_queries = schema_sync.sync(old_schema, new_schema)
# Update schema
if update_queries:
    logger.debug("Update queries to be executed:\n%s", update_queries)
    await client.execute_script(update_queries)

Because Tortoise creates separate CREATE INDEX queries, and my script (atm) can only handle KEY lines inside CREATE TABLE, I created a custom schema generator class for MySQL:

from tortoise.backends.mysql.client import MySQLClient
from tortoise.backends.mysql.schema_generator import MySQLSchemaGenerator

class KeyIndexMySQLSchemaGenerator(MySQLSchemaGenerator):
    """
    Changes "CREATE KEY" queries to KEY parameters under CREATE TABLE query (it is needed for schema_sync)
    """
    TABLE_CREATE_TEMPLATE = "CREATE " + "TABLE {exists}`{table_name}` ({fields}__INDEXES__){extra}{comment};"
    INDEX_CREATE_TEMPLATE = "KEY `{index_name}` ({fields})"

    def __init__(self, client) -> None:
        super().__init__(client)
        self._field_indexes = []

    # noinspection PyProtectedMember
    def _get_index_sql(self, model, field_names: List[str], safe: bool) -> str:
        """ Get index SQLs, but keep them for ourselves """
        self._field_indexes.append(self.INDEX_CREATE_TEMPLATE.format(
            exists="IF NOT EXISTS " if safe else "",
            index_name=self._generate_index_name(model, field_names),
            table_name=model._meta.table,
            fields=", ".join(['`' + f + '`' for f in field_names]),
        ))
        return ""

    def _get_table_sql(self, model, safe=True) -> dict:
        res = super()._get_table_sql(model, safe)
        if self._field_indexes:
            self._field_indexes.insert(0, '')
            indexes = ',\n    '.join(self._field_indexes) + "\n"
            self._field_indexes.clear()
        else:
            indexes = '\n'

        res['table_creation_string'] = res['table_creation_string'].replace('\n__INDEXES__', indexes)
        return res

MySQLClient.schema_generator = KeyIndexMySQLSchemaGenerator

grigi commented 5 years ago

There is so much to unpack here. Thank you for all this info.

The schema sync script:

It has a basic SQL parser
It has a diffing system
Based on the diff it can auto add/remove columns/indexes. (but not ALTER them)

The inside/outside create/alter statements.

We definitely needed to handle outside statements, as e.g. PostgreSQL's extensions are always outside statements, whereas MySQL prefers to add it inside.
I was told by our DBA 2 weeks ago that MySQL does alter statements by doing a full table copy. So if I have many small alter statements it is unnecessarily slow. PostgreSQL on the other hand doesn't really care.

Yes, this could be very useful, also for #104.

wallneradam commented 5 years ago

Based on the diff it can auto add/remove columns/indexes. (but not ALTER them)

It can alter fields as well. It creates lines like this to modify fields:

ALTER TABLE `user` MODIFY `full_name` VARCHAR(255) NOT NULL;

grigi commented 5 years ago

Ah, i see. I missed that.

yrik commented 4 years ago

Would it be possible to have a least some basic version for fields adding/removal? That would cover 80% of cases. Or any suggestions on what to use instead of a manual approach?

grigi commented 4 years ago

Due to my current company requiring us to manually submit DDL migrations I have not had the urgency to implement this. I know that @wallneradam has implemented a lightweight migration script (for MySQL specifically).

I am not opposed to adding such functionality (in fact, I loved how well Django's Migrations post v1.7 worked at my previous job).

What do you propose? A simple way to see if columns need to be added/removed when doing a generate_schema()?

For that to work we need a way of fetching the current table format? I think that is more-or-less what @wallneradam is doing.

antonagestam commented 4 years ago

It would be great if generating migrations doesn't require a database connection, but like Django does it. The generation of migrations would instead look at the fields of the previous migration, compare it to the current set of fields on the model and generate a set of commands required to get from one state to the next. The set of commands would be abstract such as AddField and RenameField and can then be converted into SQL at migrate-time.

grigi commented 4 years ago

@antonagestam That is what my ideal would be too. It does require a fair amount of work unfortunately, and as was mentioned here deferred because at that time we had much more maturing to do. To be honest, we are approaching the point where this is becoming important.

I updated the description at the top with my current thoughts on steps. IF anyone wants to help with any of the steps, well, that'll be great!.

wallneradam commented 4 years ago

@antonagestam @grigi This is exactly what my script does. Just I don't have time to make it work with Postgres an SQLite. But works very well with MySQL. And since my post it is more easier than before because of the latest MySQL schema changes of Tortoise. So now there's no need to have a custom SchemaGenerator.

This is how I use it (in brief):

...
from sql_schema_sync import schema_sync
...
async def db_init():
    await tortoise.Tortoise.init(...)

    # Get MySQL client
    client = tortoise.Tortoise.get_connection('default')
    # Get current table schemas
    tables = await client.execute_query(
        "SELECT DISTINCT TABLE_NAME FROM `information_schema`.`tables` WHERE TABLE_SCHEMA = %s", db)
    old_schema = ""
    for table in tables:
        old_schema += (await client.execute_query(f"SHOW CREATE TABLE `{table['TABLE_NAME']}`"))[0]['Create Table'] + \
                      ";\n"
    logger.debug("Old schema:\n%s", old_schema)
    # Get the new schemas
    new_schema = tortoise.utils.get_schema_sql(client, safe=False)
    logger.debug("New schema:\n%s", new_schema)
    # Get queries to update schema
    update_queries = schema_sync.sync(old_schema, new_schema)

    # Update schema
    if update_queries:
        logger.debug("Update queries to be executed:\n%s", update_queries)
        async with in_transaction() as connection:
            await connection.execute_script(update_queries)

    logger.info("Database is ready.")

The schema_sync is offline. So you can use it as a diff tool to create migration scripts between versions. And put these scripts into version control system which is pretty cool. One drawback is that it cannot rename fields. It can only drop the old and create a new one. Though I don't know any reliable solution for automatic renaming without some hints. What I will implement in my project is to create automatic migration scripts, and if needed I can write renames in it by hand (replace the drops and creates).

wallneradam commented 4 years ago

IMO what is needed is a DDL comparison for every database engine. I have one for MySQL, which could be a base for the others.

You already have DDL generation: tortoise.utils.get_schema_sql(). I use that and it is very usable for the migration task. You have started to make the generated schema more like the one the database engine creates. It may need more standardizations for other engines, but it is already usable for MySQL and I don't think it is too complicated for other engines if it needed at all.

I think you can include a simple solution like mine into Tortoise, but IMO any more complex thing should be application specific and should not bloat Tortoise with that. E.g. tortoise should not store any metadata into database or into the disk. It is application specific task.

Benoss commented 4 years ago

I think as a first step having something for a table like this one

class Tournament(Model):
    id = fields.IntField(pk=True)
    name = fields.TextField()
    a_new_field = fields.IntField()
    another_new_field = fields.IntField()
   a_field_that_changed_type = fields.IntField()

    def __str__(self):
        return self.name

add_statement = tortoise.utils.columns_add([Tournament.a_new_field, Tournament.another_new_field])
print(add_statement.get_sql(format='postgres'))
"ALTER TABLE tournament ADD a_new_field integer, ADD a_new_field integer;"
add_statement.execute(client)  # Apply this statement to the client using the correct format
alter_statement = tortoise.utils.columns_alter([Tournament.a_field_that_changed_type])
delete_statement = tortoise.utils.columns_delete([Tournament.a_field_not_here_anymore])
delete_statement = tortoise.utils.columns_rename(...)  # Not sure what to do with this one
delete_table_statement = tortoise.utils.table_delete('Tournament')  # The model will probably not exist anymore so using string
rename_table_statement = tortoise.utils.table_rename('OldTournamentTable', Tournament)  # The model object has the new name

This for me would be enough. I am fine to manually tell what changed and know and manually execute a script (Python or SQL). Having a Diff tool can help for testing to make sure that the schema after migration matches the new schema. What I don't want to do is write the ALTER TABLE manually, to many issues with that and that's where an ORM is useful (Makes it easier to be cross DB as well).

Second step when this is working would be for the diff tool to generate this script automatically. The third step would be to keep somewhere the state of what was executed and generate + apply automatically The fourth step is to manage Forward and Backward migrations and associated data migrations

I really think that by splitting this into different modules / commands we can provide incremental values instead of trying to find the perfect solution straight away.

For reference: another ORM I am using that support migration: https://github.com/Infinidat/infi.clickhouse_orm/blob/develop/docs/schema_migrations.md#schema-migrations

long2ice commented 4 years ago

I write a manage.py script to make migrations for tortoise-orm,depend on dbmate and schemalex.

Requirements

dbmate, just install with brew install dbmate.
schemalex, just install with brew install schemalex.

Usage

just support makemigrations,migrate,rollback,initdb now

python manage.py -h


usage: manage.py [-h] {makemigrations,migrate,rollback,initdb} ...

optional arguments: -h, --help show this help message and exit

subcommands: {makemigrations,migrate,rollback,initdb}



### Link
[https://gist.github.com/long2ice/0c9153d70fa2a08ec177fb2643820f3b](https://gist.github.com/long2ice/0c9153d70fa2a08ec177fb2643820f3b)

Hope to useful.

grigi commented 4 years ago

Hi @long2ice This is an interesting approach (and it looks to work), For now I'm adding it to the issue description as a possible method to do migrations :smile:

imaurer commented 4 years ago

For anyone looking for a solution that support postgres until this is implemented, I recommend looking at migra, which is a Python library:

https://djrobstep.com/docs/migra https://github.com/djrobstep/migra

The negative is you need to have 2 running instances of your database (current, desired) for it to do it's job of generating the migration SQL (i.e. ALTER TABLE, CREATE INDEX).

Digging into the migra Python code a little, there is a secondary project that gets the schema using the live instance:

https://github.com/djrobstep/schemainspect

And then migra calculates the differences.

Since tortoise knows it's model (e.g. Tortoise.describe_models), maybe the difference logic in migra could be used as a starting point. I was thinking that a command line tool that stashed JSON serialized versions of the models could be leveraged.

long2ice commented 4 years ago

Hey guys, I'm excited to announce that now we have a migrate tool written by pure python and just for tortoise-orm.🎉

Which named aerich.😜

~~Now it support MySQL, but is easy to extend postgres and sqlite.💪~~ Now it support MySQL & sqlite & Postgres, but the latter two need more tests.

That is a young project and must be lacking and exists bugs, PR and issues and star are welcome!

repo: https://github.com/long2ice/aerich

abondar commented 4 years ago

Hi @long2ice

Thank you for your work, finally people will be able to have to integrated solution :)

Although, lately I have been working on migrations too, but considered different approach. I am not sure what I should do with it now, in light of your work, so for now I published what is done in this PR https://github.com/tortoise/tortoise-orm/pull/406

I think I need some time to consider if I should continue my work. Right now, as I see, main difference about our approaches is that your's is more SQL-centered, while mine tends to lean to working with python code and more db-agnostic. But mine turns out to be considerably more complex, so not sure if it worth it.

grigi commented 4 years ago

@long2ice @abondar Thanks both of you 😄

I'm thinking about your question Andrey, and remember how Django was earlier on. You had syncdb, which worked until it failed spectacularly (usually in production). South worked better, but it kept on fighting Django because it didn't have abilities to generate transient models. (And fixture/migration conflicts) Finally the current full system, that let you do data migrations came, and it's great.

I still think a ground up solution would be best, but intermediate solutions will be great too.

mojimi commented 4 years ago

Anyone knows an equivalent of SHOW CREATE TABLE for PostgreSQL?

I really enjoyed @wallneradam simple solution, but unfortunately that commands does not exists for PG

KharchenkoDmitriy commented 2 years ago

Is there any data migration tool inside of aerich? If I understand right it allow only SQL migrations generally for updating schema but it is possible to migrate data with raw SQL. If there any plans to support django like data migrations via scripts(Python). I have not found any plans(issues in GitHub) related to data migrations as well as related to scripts support

tortoise / tortoise-orm

Migrations #8

Forward migrations

Data migrations

Backward migrations

The schema sync script:

The inside/outside create/alter statements.

Requirements

Usage