Rails migrations style schema:load and schema:dump needed

sqlalchemy-bot commented 6 years ago

Migrated issue, originally created by Sandeep Srinivasa (sandys)

reference: http://edgeguides.rubyonrails.org/active_record_migrations.html#schema-dumping-and-you and https://github.com/miguelgrinberg/Flask-Migrate/issues/31

In a lot of cases, it is more useful to load the schema from git, than to recreate the schema using a full migration run. If you are running continuous integration, then this is a god send, since schema load is much faster than running migrations.

Rails makes this absolutely convenient. After every migration, the schema dump file "schema.rb" is autogenerated/updated. This represents the current snapshot of the database and is unconnected to whether your models are in a certain way or not. schema.rb is usually commited to git.

If you want to recreate the db, then all you do is "rake schema:load" and it recreates the db from the snapshot. This is super-fast.

Alembic as it stands today, makes it very hard to do this. As mentioned in the comment here - https://github.com/miguelgrinberg/Flask-Migrate/issues/31#issuecomment-44425040 - you have to go through some hacks to make alembic generate a snapshot.

As mentioned in https://github.com/miguelgrinberg/Flask-Migrate/issues/31#issuecomment-324697866 ,

#!markdown

This seems like something several people on the internet are searching for - an Alembic migration script that fully configures a database, from scratch

it would be great if this was made default in alembic - to have an autogenerated schema.py file generated after every migration and a way to load this directly to an empty database. This will not break any existing semantics and will be something that will be super-helpful for those who use it.

sqlalchemy-bot commented 6 years ago

Michael Bayer (zzzeek) wrote:

Alembic is a very open ended tool and can be made to do these things. However this proposal is disturbing to me since my vague impression at the moment is that it implies that Alembic needs entirely new and different workflows that co-exist side-by-side with its existing workflow, and I think that would be deeply confusing to developers that Alembic would prominently and "by default" produce two entirely different means of it being used. Alembic's tutorial attempts to cover real-world application use from start to finish. So that's the first concern you'd need to address here, that is, what's wrong with Alembic's current workflow, and to what extent are you looking to augment or replace that workflow, how would it be documented, how would I know which workflow I want to use? (and also if you could please be much more specific as to the format of this "dumpfile", as there are many technical issues in that area as well that have been previously considered).

There is a feature to both Rails and Django which is that their ORM tools (noting, they have no equivalent of SQLAlchemy Core) are extremely opinionated and rigid in the kinds of schemas they can generate, including that the supported schema structures are tightly integrated to hardcoded ORM patterns and also that there is very limited support for database-specific constructs and datatypes as these don't fit into the "one size fits all" mentality of these tools. Because of this, they can make files like "schema.rb" and whatever South does and just automatically whip out the narrow range of schemas that they support.

Alembic, building on SQLAlchemy, can't take such a free-wheeling approach to things since Alembic supports any schema concept on any database. Hence, while Alembic has very strong support for schema generation and comparison, there is always the need to manually review the Python code that it generates. There is an eternal wave of desire that folks wouldn't have to do this, and can just have it "be magic" and get the schema 100% correct the first time, but Alembic can never promise this. Therefore a "schema.py" file is problematic; for very large databases it would be huge (which makes me very skeptical of the request that this be "by default") and would require manual review and editing every time its generated. Projects can of course carefully tailor Alembic's auto-generation to handle elements that might not render correctly by default however when spitting out an entire schema all at once, there's a lot that can go wrong.

I would reiterate my request to please begin with the question of workflow, how Alembic's workflow is not good enough, how this alternate workflow would be presented to developers in a non-confusing way. thanks!

sqlalchemy-bot commented 6 years ago

Michael Bayer (zzzeek) wrote:

generation of migration files from an existing schema is here: http://alembic.zzzcomputing.com/en/latest/cookbook.html#print-python-code-to-generate-particular-database-tables using metadata.reflect() will pull in the entire database.

sqlalchemy-bot commented 6 years ago

Donald Stufft (dstufft) wrote:

The only way I could imagine Alembic doing this (given the ability to execute arbitrary Python / SQL in a migration) is to introduce a new snapshot command that would snapshot the current database, effectively doing:

pg_dump --schema-only --format plain > dump.sql && pg_dump --data-only -format plain --table alembic_version >> dump.sql

To get all of the structure + the current alembic_version into a single SQL file, which could then be used to start the migrations (and then re-apply any additional migrations ontop of that which haven't been accounted for in a snapshot yet). The problem that I think this would have (which I believe is what @zzzeek is saying above) is that you would (A) Need an already migrated database to pull this out of (to account for the arbitrary-ness of migrations and (B) Either need to make this very specific to the database you're running on OR have to review it each time since the SQL -> generic construct mapping in Alembic isn't perfect.

sqlalchemy-bot commented 6 years ago

Michael Bayer (zzzeek) wrote:

also note the migra tool which I saw the other day that might be getting at some of the "dump the whole schema" use model.

Let me clarify I do think Alembic needs more guidance / recipes in the area of "dump the whole schema". It's something the tool can do and the recipe mentioned earlier does it, But it would not be wise to just bolt it on just like we're Rails all the sudden. If it's going to be more "official" then it has to integrate seamlessly with the existing workflow (or if we're going to change the workflow, how do we do that without breaking the world, and what would this workflow be).

sqlalchemy-bot commented 6 years ago

Donald Stufft (dstufft) wrote:

@zzzeek What about offering a command that dumps the schema + the alembic version of that schema of a running database into a file? Say migrations/schema.{py,sql} and give the end user the ability to select either a Python style schema or a pure SQL schema. If they select a Python style schema it'll be database independent but may not be perfect (basically the same caveats as autogenerated migration files) or if they select SQL, then it's just a format produced by the database itself, and should be "lossless" (or as lossless as the database itself is capable of producing). You'd then implement upgrade head as first applying that snapshot (if there isn't already an existing database), and then running through the existing database migration, which would pick up on whatever migrations weren't included in that snapshot.

It wouldn't be as automatic as actually taking a snapshot after every migration, but it would offer the ability to speed up long migration histories by taking a point in time snapshot to act as your initial seed and then apply additional migrations on top of that. Letting the user pick between a Python file and a SQL file lets them pick between the trade off of database independence (but needs review) and database dependence (but more likely to be consistent). Making it a separate command to create a snapshot but not to use one also means that behavior for existing users is 100% unchanged, but that using the snapshot is completely automatic.

sqlalchemy-bot commented 6 years ago

Michael Bayer (zzzeek) wrote:

@dstufft if we generate "raw" from the existing migrations, we have that now, just "alembic upgrade base:head --sql", that is basically the whole set of migrations in a line which is not really a "schema dump" file it would be lots of ALTER TABLE statements. We can't generate clean "CREATE TABLE" statements like a schema dump without either running that from a generated metadata dump which itself is lossy, or by running the tools that are part of the database itself, like we exec'ed pg_dump or migra or something, which we don't have in a platform-agnostic way.

sqlalchemy-bot commented 6 years ago

Donald Stufft (dstufft) wrote:

@zzzeek Yea, it looks like the SQL option for rails just literally runs pg_dump and such.

sqlalchemy-bot commented 6 years ago

Michael Bayer (zzzeek) wrote:

I don't have any plans to work on this unless there were a team of folks helping to lay out a complete workflow / story of how this would be presented, implemented, etc. it is likely something that can be built externally to Alembic using its API and I would of course support any additional API hooks such an implementation might need.

sqlalchemy-bot commented 6 years ago

Changes by Michael Bayer (zzzeek):

added labels: wontfix

sqlalchemy-bot / test_alembic_1

Rails migrations style schema:load and schema:dump needed #492