jazzband / django-redshift-backend

Redshift database backend for Django
Apache License 2.0
82 stars 47 forks source link

Add support for distkey. #42

Closed benjyw closed 6 years ago

benjyw commented 6 years ago

Subject: Add support for setting a DISTKEY.

Feature or Bugfix

Purpose

Setting a DISTKEY is a basic requirement in real-world Redshift clusters. This change adds support for designating a column as the DISTKEY. There are some complications and caveats, explained in the README.

Detail

This is implemented by setting a custom index type in the model's meta class:

from django_redshift_backend.base import DistKey
...
class Fact(Model):
  class Meta:
    indexes = [DistKey(fields=['dim1'])]

  dim1 = ForeignKey(Dimension1Model)

But, as explained in the README, this will likely require some hand-editing of the migration files, so that all necessary information is available at table creation time.

This change also fixes a minor issue with the SORTKEY support, where it was assumed that the db column name was the same as the field name. But this is not the case, e.g., for ForeignKey fields.

This change also replaces references to the defunct remote_field.to with the remote_field.related_model.

Relates

https://github.com/shimizukawa/django-redshift-backend/issues/16 https://github.com/shimizukawa/django-redshift-backend/pull/20

shimizukawa commented 6 years ago

@benjyw Thanks! Please correct flake8 error.

benjyw commented 6 years ago

I'm still chasing down an issue with using multiple databases in Django. It looks like it ignores allow_migrate() in the custom router in various circumstances. Best to hold off on this PR until I fix that, in case the fix ends up requiring changes here (I doubt it, but you never know).

Have you had success with this?

benjyw commented 6 years ago

Addressed comments and fixed style errors.

benjyw commented 6 years ago

Are you using django_redshift_backend in a scenario where you have multiple databases in your Django app, e.g., Redshift and a regular Postgres transactional db? I cannot seem to get this to work, as Django insists on applying all migrations on both dbs, even though I have created a router and overridden allow_migrate() (and verified that it is running and returning the right value). From eyeballing the Django source it looks like the router simply isn't consulted in the codepaths I'm hitting. Have you noticed anything like this?

benjyw commented 6 years ago

Hi - any update on this? I think I've addressed your concerns?

benjyw commented 6 years ago

Just to clarify, do you want me to move django_redshift_backend.distkey.Distkey to django_redshift_backend.Distkey, or just update the documentation, which currently points to the wrong place?

benjyw commented 6 years ago

Fixed README and rebased after #43 .

Let me know if you want me to move django_redshift_backend.distkey.DistKey to django_redshift_backend.DistKey. I wasn't sure if that's what you were requesting. I generally like to keep __init__.py empty, but if you prefer it there, happy to move it.

shimizukawa commented 6 years ago

Oh, I missed that the DistKey class moved from base.py to distkey.py. The document was also fixed, so this implementation is OK. I will merge this.

shimizukawa commented 6 years ago

Merged. Thanks!

benjyw commented 6 years ago

Thank you! What's your release strategy? When might I expect this to be released?

shimizukawa commented 6 years ago

Actually we have no release strategy, but I'll make a release within two days. #44

shimizukawa commented 6 years ago

0.9.0 has been released! #44