procrastinate-org / procrastinate

PostgreSQL-based Task Queue for Python
https://procrastinate.readthedocs.io/
MIT License
865 stars 53 forks source link

Best way to integrate with django migration system #280

Closed agateblue closed 4 years ago

agateblue commented 4 years ago

This isn't a bug or feature request but rather a discussion to understand the best way to integrate procrastinate within a Django project, in particular, at the SQL level, as Django as its own migration system.

My first attempt was to use raw SQL and execute each sql/migrations/delta* delta file, like this:

# Generated by Django 3.0.8 on 2020-07-18 08:39
import os
from django.db import connection
from django.db import migrations
from django.db import transaction

def apply_sql(apps, schema_editor):
    import procrastinate
    sql_path = os.path.join(
        os.path.dirname(procrastinate.__file__),
        'sql',
        'migrations',
    )

    migrations = [
        "baseline-0.5.0.sql",
        "delta_0.5.0_001_drop_started_at_column.sql",
        "delta_0.5.0_002_drop_started_at_column.sql",
        "delta_0.5.0_003_drop_procrastinate_version_table.sql",
        "delta_0.6.0_001_fix_procrastinate_fetch_job.sql",
        "delta_0.7.1_001_fix_trigger_status_events_insert.sql",
        "delta_0.8.1_001_add_queueing_lock_column.sql",
        "delta_0.10.0_001_close_fetch_job_race_condition.sql",
        "delta_0.10.0_002_add_defer_job_function.sql",
        "delta_0.11.0_003_add_procrastinate_periodic_defers.sql",
        "delta_0.12.0_001_add_foreign_key_index.sql",
    ]

    with connection.cursor() as cursor:
        with transaction.atomic():
            for name in migrations:
                full_path = os.path.join(sql_path, name)
                with open(full_path, 'rb') as f:
                    print('Applying {}'.format(full_path))
                    sql = f.read().decode()
                    cursor.execute(sql)

def rewind(*args, **kwargs):
    pass

class Migration(migrations.Migration):

    dependencies = [
        ('common', '0008_auto_20200701_1317'),
    ]

    operations = [
        migrations.RunPython(apply_sql, rewind)
    ]

However, this crashes, with the following output:

python manage.py migrate common 0009

Operations to perform:
  Target specific migration: 0009_procrastinate, from common
Running migrations:
  Applying common.0009_procrastinate...Applying /venv/lib/python3.7/site-packages/procrastinate/sql/migrations/baseline-0.5.0.sql
Applying /venv/lib/python3.7/site-packages/procrastinate/sql/migrations/delta_0.5.0_001_drop_started_at_column.sql
Applying /venv/lib/python3.7/site-packages/procrastinate/sql/migrations/delta_0.5.0_002_drop_started_at_column.sql
Applying /venv/lib/python3.7/site-packages/procrastinate/sql/migrations/delta_0.5.0_003_drop_procrastinate_version_table.sql
Applying /venv/lib/python3.7/site-packages/procrastinate/sql/migrations/delta_0.6.0_001_fix_procrastinate_fetch_job.sql
Applying /venv/lib/python3.7/site-packages/procrastinate/sql/migrations/delta_0.7.1_001_fix_trigger_status_events_insert.sql
Applying /venv/lib/python3.7/site-packages/procrastinate/sql/migrations/delta_0.8.1_001_add_queueing_lock_column.sql
Applying /venv/lib/python3.7/site-packages/procrastinate/sql/migrations/delta_0.10.0_001_close_fetch_job_race_condition.sql
Applying /venv/lib/python3.7/site-packages/procrastinate/sql/migrations/delta_0.10.0_002_add_defer_job_function.sql
Applying /venv/lib/python3.7/site-packages/procrastinate/sql/migrations/delta_0.11.0_003_add_procrastinate_periodic_defers.sql
Traceback (most recent call last):
  File "/venv/lib/python3.7/site-packages/django/db/backends/utils.py", line 84, in _execute
    return self.cursor.execute(sql)
psycopg2.errors.SyntaxError: syntax error at or near ";"
LINE 9: DROP FUNCTION IF EXISTS procrastinate_defer_job;
                                                       ^

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "manage.py", line 27, in <module>
    execute_from_command_line(sys.argv)
  File "/venv/lib/python3.7/site-packages/django/core/management/__init__.py", line 401, in execute_from_command_line
    utility.execute()
  File "/venv/lib/python3.7/site-packages/django/core/management/__init__.py", line 395, in execute
    self.fetch_command(subcommand).run_from_argv(self.argv)
  File "/venv/lib/python3.7/site-packages/django/core/management/base.py", line 328, in run_from_argv
    self.execute(*args, **cmd_options)
  File "/venv/lib/python3.7/site-packages/django/core/management/base.py", line 369, in execute
    output = self.handle(*args, **options)
  File "/venv/lib/python3.7/site-packages/django/core/management/base.py", line 83, in wrapped
    res = handle_func(*args, **kwargs)
  File "/venv/lib/python3.7/site-packages/django/core/management/commands/migrate.py", line 233, in handle
    fake_initial=fake_initial,
  File "/venv/lib/python3.7/site-packages/django/db/migrations/executor.py", line 117, in migrate
    state = self._migrate_all_forwards(state, plan, full_plan, fake=fake, fake_initial=fake_initial)
  File "/venv/lib/python3.7/site-packages/django/db/migrations/executor.py", line 147, in _migrate_all_forwards
    state = self.apply_migration(state, migration, fake=fake, fake_initial=fake_initial)
  File "/venv/lib/python3.7/site-packages/django/db/migrations/executor.py", line 245, in apply_migration
    state = migration.apply(state, schema_editor)
  File "/venv/lib/python3.7/site-packages/django/db/migrations/migration.py", line 124, in apply
    operation.database_forwards(self.app_label, schema_editor, old_state, project_state)
  File "/venv/lib/python3.7/site-packages/django/db/migrations/operations/special.py", line 190, in database_forwards
    self.code(from_state.apps, schema_editor)
  File "/app/funkwhale_api/common/migrations/0009_procrastinate.py", line 37, in apply_sql
    cursor.execute(sql)
  File "/venv/lib/python3.7/site-packages/django/db/backends/utils.py", line 100, in execute
    return super().execute(sql, params)
  File "/venv/lib/python3.7/site-packages/cacheops/transaction.py", line 93, in execute
    result = self._no_monkey.execute(self, sql, params)
  File "/venv/lib/python3.7/site-packages/django/db/backends/utils.py", line 68, in execute
    return self._execute_with_wrappers(sql, params, many=False, executor=self._execute)
  File "/venv/lib/python3.7/site-packages/django/db/backends/utils.py", line 77, in _execute_with_wrappers
    return executor(sql, params, many, context)
  File "/venv/lib/python3.7/site-packages/django/db/backends/utils.py", line 86, in _execute
    return self.cursor.execute(sql, params)
  File "/venv/lib/python3.7/site-packages/django/db/utils.py", line 90, in __exit__
    raise dj_exc_value.with_traceback(traceback) from exc_value
  File "/venv/lib/python3.7/site-packages/django/db/backends/utils.py", line 84, in _execute
    return self.cursor.execute(sql)
django.db.utils.ProgrammingError: syntax error at or near ";"
LINE 9: DROP FUNCTION IF EXISTS procrastinate_defer_job;

Execution of delta_0.11.0_003_add_procrastinate_periodic_defers.sql fails with a syntax error, which I don't understand unfortunately. If you have some ideas, I can definitely try these, and possibly document Django integration when I have a working setup :)

ewjoachim commented 4 years ago

Hm, you're right, if we expect people using Django to use procrastinate, we'll need to ship a way to transform PG migrations SQL files into Django migrations. Your approach seems nice, but I'd tend to try and generate one Django migration per SQL migration file (whereas if I understand your code, you have all SQL migrations in a single Django migration). But that's a detail, and doesn't help your problem.

Can I ask why you used a RunPython and not a RunSQL ?

I can try to have a look, but I don't have a clue at this point why this fails. Or you can have a look on your side. Or we can do this in pair programming if you want, like in the good ol' days :)

I'm perfectly fine adding a Django submodule for simplifying the usage of procrastinate with django, if it's deemed preferable to a django-procrastinate dedicated package.

agateblue commented 4 years ago

Hm, you're right, if we expect people using Django to use procrastinate, we'll need to ship a way to transform PG migrations SQL files into Django migrations. Your approach seems nice, but I'd tend to try and generate one Django migration per SQL migration file (whereas if I understand your code, you have all SQL migrations in a single Django migration). But that's a detail, and doesn't help your problem.

I'm not sure about the "1 procrastinate migration <-> 1 django migration" approach. As far as I can tell, it doesn't bring any additional value and make integration with the app way harder (a dozen files to add, instead of only one).

Can I ask why you used a RunPython and not a RunSQL ?

I can try that, I wanted to avoid file I/O at import time, which RunSQL would require.

IMHO, the easiest option would be as you suggested, have a contrib.django package which would be a django app, and contain the proper migrations. Integrating procrastinate in a django project would then only require INSTALLED_APPS = [..., "procrastinate.contrib.django", ...] and python manage.py migrate.

However, it would increase the maintenance burden on procrastinate side, possibly by having duplicated SQL code (though it could probably be automated).

agateblue commented 4 years ago

If I use RunSQL, I end up with the same error. This is the code I used:


# Generated by Django 3.0.8 on 2020-07-18 08:39
import os
from django.db import migrations

def get_sql(name):
    import procrastinate
    sql_path = os.path.join(
        os.path.dirname(procrastinate.__file__),
        'sql',
        'migrations',
    )

    full_path = os.path.join(sql_path, name)
    with open(full_path, 'rb') as f:
        return f.read().decode()

class Migration(migrations.Migration):

    dependencies = [
        ('common', '0008_auto_20200701_1317'),
    ]

    operations = [
        migrations.RunSQL(get_sql("baseline-0.5.0.sql")),
        migrations.RunSQL(get_sql("delta_0.5.0_001_drop_started_at_column.sql")),
        migrations.RunSQL(get_sql("delta_0.5.0_002_drop_started_at_column.sql")),
        migrations.RunSQL(get_sql("delta_0.5.0_003_drop_procrastinate_version_table.sql")),
        migrations.RunSQL(get_sql("delta_0.6.0_001_fix_procrastinate_fetch_job.sql")),
        migrations.RunSQL(get_sql("delta_0.7.1_001_fix_trigger_status_events_insert.sql")),
        migrations.RunSQL(get_sql("delta_0.8.1_001_add_queueing_lock_column.sql")),
        migrations.RunSQL(get_sql("delta_0.10.0_001_close_fetch_job_race_condition.sql")),
        migrations.RunSQL(get_sql("delta_0.10.0_002_add_defer_job_function.sql")),
        migrations.RunSQL(get_sql("delta_0.11.0_003_add_procrastinate_periodic_defers.sql")),
        migrations.RunSQL(get_sql("delta_0.12.0_001_add_foreign_key_index.sql")),
    ]

Maybe it's related to my postgresql installation, though it's a pretty basic postgres:11 docker installation. I'll try piping the SQL files directly to postres.

agateblue commented 4 years ago

So, piping the migrations directly to postgresql raise the same bug:

docker-compose -f dev.yml exec -T postgres psql -U postgres -P < ../../procrastinate/procrastinate/sql/migrations/delta_0.11.0_003_add_procrastinate_periodic_defers.sql

CREATE TABLE
ERROR:  syntax error at or near ";"
LINE 1: DROP FUNCTION IF EXISTS procrastinate_defer_job;
                                                       ^
CREATE FUNCTION
ERROR:  syntax error at or near ";"
LINE 1: DROP FUNCTION IF EXISTS procrastinate_defer_periodic_job;
                                                                ^
CREATE FUNCTION

The error is still there, but because I piped a whole file, it doesn't crash immediatly.

As far as I know, it's because the DROP FUNCTION statement requires you to include the arguments of the function, like this: DROP FUNCTION sqrt(integer);

It's probably failing in other setups but silently ignored.

ewjoachim commented 4 years ago

Hm, nice find, thanks !

ewjoachim commented 4 years ago

Hm, the doc says:

If the function name is unique in its schema, it can be referred to without an argument list:

DROP FUNCTION update_employee_salaries;

So, I still don't understand how this is a syntax error.

ewjoachim commented 4 years ago

On my machine:

$ psql
psql (12.3, server 10.10 (Debian 10.10-1.pgdg90+1))
Type "help" for help.

procrastinate=# DROP FUNCTION IF EXISTS procrastinate_defer_job;
DROP FUNCTION
procrastinate=# DROP FUNCTION IF EXISTS procrastinate_defer_job;
NOTICE:  function procrastinate_defer_job() does not exist, skipping
DROP FUNCTION
procrastinate=# DROP FUNCTION IF EXISTS procrastinate_defer_periodic_job;
DROP FUNCTION
agateblue commented 4 years ago

I thought it may be related to an inconsistent state in my setup, so I'v tried with a clean postgres 9.6 container. Launch a container:

docker run -e POSTGRES_HOST_AUTH_METHOD=trust --name test-procrastinate postgres:9.6 postgres -c log_min_duration_statement=0

Apply the migrations:

docker exec -i test-procrastinate psql -Upostgres  < ../../procrastinate/procrastinate/sql/migrations/baseline-0.5.0.sql
docker exec -i test-procrastinate psql -Upostgres  < ../../procrastinate/procrastinate/sql/migrations/delta_0.5.0_001_drop_started_at_column.sql
docker exec -i test-procrastinate psql -Upostgres  < ../../procrastinate/procrastinate/sql/migrations/delta_0.5.0_002_drop_started_at_column.sql
docker exec -i test-procrastinate psql -Upostgres  < ../../procrastinate/procrastinate/sql/migrations/delta_0.5.0_003_drop_procrastinate_version_table.sql
docker exec -i test-procrastinate psql -Upostgres  < ../../procrastinate/procrastinate/sql/migrations/delta_0.6.0_001_fix_procrastinate_fetch_job.sql
docker exec -i test-procrastinate psql -Upostgres  < ../../procrastinate/procrastinate/sql/migrations/delta_0.7.1_001_fix_trigger_status_events_insert.sql
docker exec -i test-procrastinate psql -Upostgres  < ../../procrastinate/procrastinate/sql/migrations/delta_0.8.1_001_add_queueing_lock_column.sql
docker exec -i test-procrastinate psql -Upostgres  < ../../procrastinate/procrastinate/sql/migrations/delta_0.10.0_001_close_fetch_job_race_condition.sql
docker exec -i test-procrastinate psql -Upostgres  < ../../procrastinate/procrastinate/sql/migrations/delta_0.10.0_002_add_defer_job_function.sql
docker exec -i test-procrastinate psql -Upostgres  < ../../procrastinate/procrastinate/sql/migrations/delta_0.11.0_003_add_procrastinate_periodic_defers.sql
docker exec -i test-procrastinate psql -Upostgres  < ../../procrastinate/procrastinate/sql/migrations/delta_0.12.0_001_add_foreign_key_index.sql

Which gives the following output:

CREATE FUNCTION
ALTER TABLE
DROP TABLE
CREATE FUNCTION
DROP TRIGGER
CREATE TRIGGER
ALTER TABLE
CREATE INDEX
DROP TABLE
CREATE INDEX
CREATE FUNCTION
CREATE FUNCTION
CREATE FUNCTION
CREATE TABLE
ERROR:  syntax error at or near ";"
LINE 1: DROP FUNCTION IF EXISTS procrastinate_defer_job;
                                                       ^
CREATE FUNCTION
ERROR:  syntax error at or near ";"
LINE 1: DROP FUNCTION IF EXISTS procrastinate_defer_periodic_job;
                                                                ^
CREATE FUNCTION
CREATE INDEX

Now, trying with a postgres 10 container:

NOTICE:  extension "plpgsql" already exists, skipping
CREATE EXTENSION
CREATE TABLE
CREATE TYPE
CREATE TYPE
CREATE TABLE
CREATE TABLE
CREATE TABLE
CREATE FUNCTION
CREATE FUNCTION
CREATE FUNCTION
CREATE FUNCTION
CREATE FUNCTION
CREATE FUNCTION
CREATE INDEX
CREATE TRIGGER
CREATE TRIGGER
CREATE TRIGGER
CREATE TRIGGER
CREATE INDEX
CREATE FUNCTION
ALTER TABLE
DROP TABLE
CREATE FUNCTION
DROP TRIGGER
CREATE TRIGGER
ALTER TABLE
CREATE INDEX
DROP TABLE
CREATE INDEX
CREATE FUNCTION
CREATE FUNCTION
CREATE FUNCTION
CREATE TABLE
DROP FUNCTION
CREATE FUNCTION
DROP FUNCTION
NOTICE:  function procrastinate_defer_periodic_job() does not exist, skipping
CREATE FUNCTION
CREATE INDEX

So we can reasonnably say this is related to my Postgres version, and the behaviour change starting in Postgres 10.

Now, back to the initial discussion, regarding the opportunity to package procrastinate as a django app, would you accept a contribution based on https://github.com/peopledoc/procrastinate/issues/280#issuecomment-660498268?

ewjoachim commented 4 years ago

would you accept a contribution based on #280 (comment)?

Yes! Definitely! As I said:

I'm perfectly fine adding a Django submodule for simplifying the usage of procrastinate with django, if it's deemed preferable to a django-procrastinate dedicated package.

Also:

Your approach seems nice, but I'd tend to try and generate one Django migration per SQL migration file (whereas if I understand your code, you have all SQL migrations in a single Django migration).

And I stand by this if it's not too hard to do. This will make it much easier when we issue releases in the future: people will update procrastinate, then manage.py migrate and it should "just work".

ewjoachim commented 4 years ago

Oh, also, we might try to explicit supported PG versions, and test them in the CI. In theory, this would be a good thing. In practice, I'm afraid testing will make the CI much much longer, and much more complicated. I'll create a ticket. (EDIT: https://github.com/peopledoc/procrastinate/issues/282)

ewjoachim commented 4 years ago

If it's not possible to do that cleanly within Django (and reading the code, I'm doubting it is), it could also be acceptable to generate migration files in setup.py (so they'd end up in the release) and have them be gitignored.

agateblue commented 4 years ago

I'd probably hardcode the migrations, because generating migrations on the fly during release comes with two potential pitfalls:

  1. It's really hard to test that the migrations are actually working if files aren't commited to the repo
  2. If the migration-generating code changes slightly, we could face hard-to-debug inconsistencies

Would it be acceptable to add a manual step when a new migration is created, basically copy-pasting 10 lines of code in the django app as a new migration?

ewjoachim commented 4 years ago

I understand those points. I'll just note:

It's really hard to test that the migrations are actually working if files aren't committed to the repo

Hm, not sure about that. If migrations are generated every time we launch setup.py, then they will be OK in the CI, and then can be regenerated in the dev env by re-launching pip install -e, same as when a metadata value changes.

If the migration-generating code changes slightly, we could face hard-to-debug inconsistencies

With the current implementation, it's the same: if the SQL migration code changes, then the django migration will change. And I can't really see what we may change in the Django migration part if the only thing it does is a RunSQL. But your approach is leaning on the safe side, and I can't blame that.

That being said, I'm definitely not against your proposed changes. I'd rather we solve problems and iterate. We'll see how it goes.

agateblue commented 4 years ago

Hm, not sure about that. If migrations are generated every time we launch setup.py, then they will be OK in the CI, and then can be regenerated in the dev env by re-launching pip install -e, same as when a metadata value changes.

Oh, I didn't thought about that. I'm not sure how to trigger a command when pip install -e . is called though, but I can definitely write something to generate the migration files once we figure that out!

agateblue commented 4 years ago

@ewjoachim I've made an attempt at generating migrations on the fly (see my last commit) when running pip install -e .. I had to tweak the order, since the default order of SQL files isn't correct, and it work. However, it doesn't work with pip install . (without the -e flag), I need to find out why

ewjoachim commented 4 years ago

Continuing the discussion in the PR !