grnet / djnro

DjNRO hits the decks of eduroam database management
http://djnro.grnet.gr/
Other
10 stars 21 forks source link

Migration issue on PostgreSQL < 11 #73

Closed vladimir-mencl-eresearch closed 4 years ago

vladimir-mencl-eresearch commented 4 years ago

Hi @zmousm ,

I have successfully upgraded REANNZ DjNRO instance to 1.2, and I'm now trying to update the Docker-packaged version I created for the XeAP project - so that these deployments can also update to EDB2.

When testing the migration locally, I have edumanage/migrations/0005_edb2_schema_part1.py failing with:

psycopg2.OperationalError: cannot ALTER TABLE "edumanage_serviceloc" because it has pending trigger events

This deployment is using PostgresQL 9.5. I did not run into this with the REANNZ instance, which runs with PostgreSQL 11. I have experimented with different version, PostgreSQL 10 has the issue as well - something changed between 10 and 11 that makes the problem go away.

I have found references to this issue in the Django Migrations documentation - https://django.readthedocs.io/en/1.11.x/ref/migration-operations.html#runpython explicitly says:

On databases that do support DDL transactions (SQLite and PostgreSQL), RunPython operations do not have any transactions automatically added besides the transactions created for each migration. Thus, on PostgreSQL, for example, you should avoid combining schema changes and RunPython operations in the same migration or you may hit errors like OperationalError: cannot ALTER TABLE "mytable" because it has pending trigger events.

From these pointers, I also found I can work around this problem by making the migration non-atomic:

 class Migration(migrations.Migration):

+    atomic = False

I don't know how many DjNRO deployments are out there, and whether it would be worth supporting older versions. Also not sure how to weigh the risks associated with making the migration non-atomic.

For the XeAP deployment, I can just ship a patched version that would make this migration non-atomic.

I could also generalize this and make it conditional on "databases that support DDL transactions", where this issue might happen (because they combine schema alterations and RunPython code).

Something like

    atomic = django.db.connection.vendor not in ['postgresql', 'sqlite3']

What do you think? Should I create a PR with this - or not worth it?

Cheers, Vlad

zmousm commented 4 years ago

I found a second issue while investigating this: edumanage/migrations/0009_edb2_schema_part3_cleanup.py is not reversible...

I thought I had tested this extensively, but then again...

Hold on, I am looking into it.

vladimir-mencl-eresearch commented 4 years ago

Hi @zmousm , thanks a lot for looking into it! I look forward to hearing from you further once you know more. Cheers, Vlad

zmousm commented 4 years ago

I could not reproduce the issue you reported using postgres 9.5, 9.6 and 10 (previously tested with postgres 11). I used their official docker images.

Could you provide the exact versions of python, Django, postgres, psycopg2 you used so that I can try again?

In my case, besides postgres: Python 3.7.4 Django 1.11.28 psycopg2 2.8.4

vladimir-mencl-eresearch commented 4 years ago

Hi @zmousm , thanks for the effort put into it.

I'm on just slightly different versions, but that won't be the issue.

If I run migrations against an empty database, everything goes all fine - because there are no actual invocations of the RunPython code.

The issue only kicks in once there is data to migrate.

My minimum sequence to reproduce is:

  1. Git: switch to before the EDB2 changes:

    git checkout 6bfdf14e3c50fa03973c6d860e079a9ebb97e342

  2. Run migrations (up to just before EDB2)

    ./manage.py migrate

  3. Bootstrap DB (in postgres container):

    psql --command="create role djnro with login encrypted password 'password' ;" psql --command="create database djnro with owner djnro;"

  4. Bootstrap DjNRO - in djnro container

    ./manage.py createsuperuser --noinput --username "$ADMIN_USERNAME" --email "$ADMIN_EMAIL" ./manage.py changepassword "$ADMIN_USERNAME"

  5. Bootstrap DjNRO - realm, Institution and ServiceLoc, run inside ./manage.py shell

    from edumanage.models import Realm, Name_i18n, Institution, ServiceLoc r=Realm(country="NZ") r.save() rn=Name_i18n(content_object=r,name="REANNZ",lang="en") rn.save() i=Institution(ertype=3,realmid=r) i.save() s=ServiceLoc(institutionid=i,longitude=15,latitude=50, port_restrict=False, transp_proxy=False, IPv6=False, NAT=False, wired=False, AP_no=1,enc_level=["WPA2/AES"]) s.save()

  6. And now switch to current EDB2 code and attempt migrations:

    git checkout master ./manage.py migrate

And I believe this should be enough for you to reproduce the issue.

But for completeness, this is my DjNRO environment (based on Docker image python:3.7):

root@9204b2695e28:/djnro# python --version
Python 3.7.6
root@9204b2695e28:/djnro# pip list
Package                Version
---------------------- -----------
certifi                2019.11.28
chardet                3.0.4
contextlib2            0.6.0.post1
defusedxml             0.6.0
Django                 1.11.28
django-dont-vary-on    1.0.0
django-registration    2.0.4
django-sortedm2m       3.0.0
django-tinymce         2.8.0
django-widget-tweaks   1.4.5
idna                   2.7
ipaddress              1.0.22
lxml                   4.5.0
Mako                   1.0.3
MarkupSafe             1.1.1
oauthlib               3.1.0
pip                    20.0.2
psycopg2-binary        2.7.7
PyJWT                  1.7.1
python3-openid         3.1.0
pytz                   2019.3
PyYAML                 3.13
raven                  5.32.0
requests               2.20.0
requests-oauthlib      1.3.0
semantic-version       2.8.4
setuptools             45.2.0
six                    1.14.0
social-auth-app-django 3.1.0
social-auth-core       3.2.0
urllib3                1.24.3
uWSGI                  2.0.18
wheel                  0.34.2

Cheers, Vlad

vladimir-mencl-eresearch commented 4 years ago

PS: Django 1.11 asks for psycopg2-binary 2.7.7 (./manage.py inspectdb breaks with 2.8.4) - but I had the issue reported here with both psycopg2 versions.

vladimir-mencl-eresearch commented 4 years ago

Hi @zmousm , once more PS :-)

I've just checked and there are two more migrations that use RunPython (AppAwareRunPython) - edumanage/migrations/0007_edb2_data_part1.py and edumanage/migrations/0008_edb2_data_part2.py But, these migrations are NOT making any DDL changes, just pure data manipulation, so they don't run into this issue. So it's just 0005_edb2_schema_part1.py because it's both making DDL changes AND running RunPython code in one single migration.

I've tested and the conditional check for dropping atomicity works for me: atomic = django.db.connection.vendor not in ['postgresql', 'sqlite3']

Cheers, Vlad

zmousm commented 4 years ago

Hey @vladimir-mencl-eresearch,

I had overlooked that the problem manifested upon ALTER TABLE "edumanage_serviceloc"and the migration I was testing was not touching that table; I can see the problem now and I am investigating it.

zmousm commented 4 years ago

Hey @vladimir-mencl-eresearch, I pushed a fix, please review the linked PR

zmousm commented 4 years ago

Let me finally note that I have not found an explanation why the problem manifests on older PostgreSQL versions while it does not on newer ones. It is not affected by the psycopg2 version, for one thing, or anything else. The problem is triggered consistently if any operation attempts to perform ALTER TABLE on a table after records have been updated via RunPython.

vladimir-mencl-eresearch commented 4 years ago

Hi @zmousm , thanks again for the quick fix.

And, I got stuck on this puzzling piece too. I went in detail through the PostgreSQL 11 release notes, but could not find anything that would explain this.

Cheers, Vlad