r4fek / django-cassandra-engine

Django Cassandra Engine - the Cassandra backend for Django
BSD 2-Clause "Simplified" License
365 stars 85 forks source link

Add support for migrations #16

Open AMeng opened 9 years ago

AMeng commented 9 years ago

Django 1.7's migrations should be supported.

Is this something that is being worked on or has been attempted? I am interested in helping out with this if you think it would be a reasonable addition. If Django's built-in migrations are too difficult to tie into, I'm not opposed to writing something custom for this. My team at work uses a custom solution that I think I could adapt to work here.

Any thoughts or suggestions around this?

r4fek commented 9 years ago

Yes, this is already planned task and very desirable addition. I think it would be better to tie this into default Django's migrations, than writing something custom from scratch. It would be really great if you could help me with this one!

sob., 27 gru 2014, 02:40 Alex Meng użytkownik notifications@github.com napisał:

Django 1.7's migrations should be supported.

Is this something that is being worked on or has been attempted? I am interested in helping out with this if you think it would be a reasonable addition. If Django's built-in migrations are too difficult to tie into, I'm not opposed to writing something custom for this. My team at work uses a custom solution that I think I could adapt to work here.

Any thoughts or suggestions around this?

— Reply to this email directly or view it on GitHub https://github.com/r4fek/django-cassandra-engine/issues/16.

AMeng commented 9 years ago

I spent some time digging into this and it is not as simple as I'd hoped it would be.I think cqlengine's default behavior of just syncing the keyspaces is the best approach, given the current tools.

The main issue with Django's migrations is that they tie very heavily into the Django models. In order to get around this, a lot of subclassing needs to be done. And because the migration code wasn't written with extensibility in mind, often 100+ line functions need to be copy/pasted just to change one line. For example:

The default MigrationExecutor needs to be altered. But to use it, you'll need to copy/paste the entire Command.handle method just to change this line

Obviously copy/pasting 100+ lines just to change one line is really bad and a nightmare to maintain. You'll find similar problems with a lot of the migration classes (MigrationAutodetector, MigrationLoader, MigrationRecorder, etc).

Almost every file in Django's migration code needs to be altered to get this working as is.

In my opinion, the correct solution is to subclass Django's models. This seems like it might fall on the responsibility of cqlengine itself. Or maybe a cqlengine-django fork needs to be created. The same would go for the current cqlengine Column, it really should be subclassing Django's Field class.

r4fek commented 9 years ago

I was afraid you would say this.. I agree that this is "correct" solution, but it requires way too much work in my opinion.. Subclassing Django's models isn't easy as cqlengine is far from being compatible with Django.

And what about writing custom solution for this task? I guess this is much simpler task and still not so far from being "correct solution". What do you think?

AMeng commented 9 years ago

I'm not sure that a simple custom migration solution would be helpful here. It seems like cqlengine's sync_table function does this rather well already. And your use of it in the migrate command makes a lot of sense. The custom solution that I am currently using doesn't add anything to that. Its just a list of migrations (CQL statements) that are executed in order. I actually think sync_table is a better solution.

For reference, if someone wants to tackle this in the future, it seems like an alternative to subclassing Django's models is to customize the cqlengine models with a deconstruct method. This allows Django to detect changes in the class. Unfortunately, this is not enough. Django's internals still assume the migration history is stored in a SQL database, and executes queries under that assumption. You would need to manually edit that portion of Django's code, in several places.

I don't think I'll be doing anymore work on this, sorry. :disappointed:

But, thanks for the awesome library, and for being so responsive.

r4fek commented 9 years ago

Too bad sync_table can only add column if one is missing. But this has to be enough at this moment. Thanks for your contribution!

lsmithso commented 9 years ago

Hi:

We recently investigated this and came up with a sort of solution. Its not as automated or flexible as django's migrations or south,but it worked for us, and is better than the alternative.

Our first conclusion was that the existing migration tools are unusable with cassandra - too much sql and too relational.

cqlengine/cassandra can't generally alter the types of existing columns, neither can it add/alter/drop columns with a key. The only way to sync these is to drop and re-sync the model. This of course loses data.

Our solution was to develop a framework that allowed existing data to be backed up to a tempoary table, then migrated and restored after the model is dropped and re-synced to the keyspace. Key to this is a db_model that is reverse engineered from the current cassandra column family. This allows the existing data to be queried and backed up.

Migration scripts are written by developers and applied manually. This framework doesn't generate migrations , neither does it track the state of the db, and has no history of previous migrations. Migrations cannot be undone.

Its a bit of a hack, but I think the real solution lies within django-nonrel.