MattFisher commented 2 years ago

Subject: Get inspectdb working

Feature or Bugfix

Bugfix

Purpose

Implement database inspection methods necessary for inspectdb management command to work.

Redshift is based on PostgreSQL 8.x, but Django supports PostgreSQL >= 9.4, so some of the more modern features of PostgreSQL such as unnest and array_agg are not available in Redshift. Django uses some of these in the postgres backend introspection methods, so different implementations were needed to retrieve the same information from a Redshift database.

Allow use with django-sql-explorer, which uses get_table_description to provide its schema view.

This has been broken since Django 3.2 introduced support for collations in the postgres backend. Relevant commit

Detail

Implementation of the changes needed for get_table_description was straightforward, and involved removing the join to the pg_collation table from the SQL.

The changes required to get_constraints, used by inspectdb, were more complex. The postgres version uses unnest, but there is no good substitute for this function in Redshift. Rather than attempting to build an overly complex query to achieve the same output as the postgres backend, a second query is made to the pg_attribute table and the join made in python. This is done twice within the get_constraints method, once for constraints and again for indexes.

These changes are difficult to write tests for, as the current test suite does not assume a connection to a Redshift cluster, and using a database cursor in a test requires a working database (pytest will enforce use of a db fixture). The tests as written mock the cursor, but this has the disadvantage of having to hard code expected SQL and return values, which leaves the tests very brittle.

I have not attempted to implement the orders, type, definition, or options fields within get_constraints. I don't know where these are used or if they're necessary.

Relates

13

shimizukawa commented 2 years ago

Thank you for your PR! I'll take a look in a week (or around new year holiday)

codecov[bot] commented 2 years ago

Codecov Report

Merging #90 (a6e1015) into master (4fd73cd) will increase coverage by 2.08%. The diff coverage is 92.30%.

:exclamation: Current head a6e1015 differs from pull request most recent head a85bd90. Consider uploading reports for the commit a85bd90 to get more accurate results

@@            Coverage Diff             @@
##           master      #90      +/-   ##
==========================================
+ Coverage   47.01%   49.09%   +2.08%     
==========================================
  Files           3        3              
  Lines         251      277      +26     
  Branches       62       79      +17     
==========================================
+ Hits          118      136      +18     
- Misses        123      130       +7     
- Partials       10       11       +1

Impacted Files	Coverage Δ
django_redshift_backend/base.py	`48.27% <92.30%> (+4.87%)`	:arrow_up:
django_redshift_backend/__init__.py	`50.00% <0.00%> (-50.00%)`	:arrow_down:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 4fd73cd...a85bd90. Read the comment docs.

MattFisher commented 2 years ago

It might also be worth adding supports_collation_on_charfield = False and supports_collation_on_textfield = False to the DatabaseFeatures class, but I haven't looked into this enough to discover what it affects or how to test it.

From https://docs.djangoproject.com/en/4.0/releases/3.2/#database-backend-api

Third-party database backends must implement support for column database collations on CharFields and TextFields or set DatabaseFeatures.supports_collation_on_charfield and DatabaseFeatures.supports_collation_on_textfield to False. If non-deterministic collations are not supported, set supports_non_deterministic_collations to False.

shimizukawa commented 2 years ago

These changes are difficult to write tests for, as the current test suite does not assume a connection to a Redshift cluster, and using a database cursor in a test requires a working database (pytest will enforce use of a db fixture). The tests as written mock the cursor, but this has the disadvantage of having to hard code expected SQL and return values, which leaves the tests very brittle.

Yes, it is currently difficult to provide a connection to Redshift for automated testing. As an alternative, I have set up postgres on GitHub Actions so that we can do a simple test on #93. I don't think it works for this case yet, but I hope to make it work with postgres in the future.

Anyway, thanks to this PR, we can now use inspectdb with django-redshift-backend, and I've prepared a project showcase in the examples directory, so I'll provide django-sql-explorer case later.

Thanks for your awesome contribution!

shimizukawa commented 2 years ago

As an alternative, I have set up postgres on GitHub Actions so that we can do a simple test on #93. I don't think it works for this case yet, but I hope to make it work with postgres in the future.

done by #95 ;)

shimizukawa commented 2 years ago

It has been released https://pypi.org/project/django-redshift-backend/3.0.0/

jazzband / django-redshift-backend

Database inspection capability - fixes #13 #90

Feature or Bugfix

Purpose

Detail

Relates

13

Codecov Report