procrastinate-org / procrastinate

PostgreSQL-based Task Queue for Python
https://procrastinate.readthedocs.io/
MIT License
852 stars 52 forks source link

Add job priority feature #1070

Closed medihack closed 3 months ago

medihack commented 4 months ago

Closes #386

Successful PR Checklist:

PR label(s):

Some design decisions:

Questions:

github-actions[bot] commented 4 months ago

Coverage report

Click to see where and how coverage changed

FileStatementsMissingCoverageCoverage
(new stmts)
Lines missing
  procrastinate
  cli.py
  jobs.py
  tasks.py
  testing.py
  procrastinate/contrib/django
  models.py
Project Total  

This report was generated by python-coverage-comment-action

medihack commented 4 months ago

Of course, I'm always interested on feedback on the codebase (if you saw surprising or annoying things, etc)

What I saw was quite easy to understand (as far as I can tell after such a short time). Looks like a clean and systematic structure. Its really a gem of a library :-) It should be better known (I added it to Django Packages).

Is the modification of static_migrations.py correct (especially the dependencies and the last row)? I only have a vague idea what it does.

ewjoachim commented 4 months ago

Is the modification of static_migrations.py correct (especially the dependencies and the last row)? I only have a vague idea what it does.

I think it's not but it's not your fault, it's mine. Let me explain:

At first we had no Django. We had migrations, that were pure SQL Then we got a django integration that dynamically generated migrations based on the SQL Then I recently added a better Django integration with models and even if it's not Django's job to create the tables linked to the models, Django still need migrations describing the models, which is the "static migration" (the only migration that is not dynamic) Ideally, the static migration should have came before the dynamic ones, but I don't want to change old migrations because people are going to have problems if I do that (I should do some kind of squashing I guess).

But then your contribution is the first one that does a normal SQL migration after the introduction of the Django model, and I had absolutely not foreseen that it would be awkward: because we'll need to have the old dynamic migrations and then the static one and then your dynamic migration and all the future dynamic ones. Also, I'll need to try and think if the change you made should be backported in the static migration or if it should be another migration. Mixing static and dynamic migrations, sadly, is a recipe for awkward times. I'm considering switching to plain migrations with a bit of tooling to generate it rather than generating them on the fly even though that's one of my favorite piece of Python code 😀

I'd tend to say: the easiest way out for you is probably to let Django generate the migration it needs with makemigrations (and not change the static migration). You may need a --merge. I'll cleanup my own mess :)

Of course if you want to take care of that, you can, but you don't have to.

medihack commented 4 months ago

At first we had no Django. We had migrations, that were pure SQL Then we got a django integration that dynamically generated migrations based on the SQL Then I recently added a better Django integration with models and even if it's not Django's job to create the tables linked to the models, Django still need migrations describing the models, which is the "static migration" (the only migration that is not dynamic) Ideally, the static migration should have came before the dynamic ones, but I don't want to change old migrations because people are going to have problems if I do that (I should do some kind of squashing I guess).

But then your contribution is the first one that does a normal SQL migration after the introduction of the Django model, and I had absolutely not foreseen that it would be awkward: because we'll need to have the old dynamic migrations and then the static one and then your dynamic migration and all the future dynamic ones. Also, I'll need to try and think if the change you made should be backported in the static migration or if it should be another migration. Mixing static and dynamic migrations, sadly, is a recipe for awkward times. I'm considering switching to plain migrations with a bit of tooling to generate it rather than generating them on the fly even though that's one of my favorite piece of Python code 😀

So, if I understand it correctly, it is there to make Django aware that those models were created. Why not just use managed=False on the Django models (see also legacy databases)? That way you could just keep the dynamic migrations and eliminate the (redundant) static migration. But I may overlook something here. 🤷‍♂️

medihack commented 4 months ago

So, if I understand it correctly, it is there to make Django aware that those models were created. Why not just use managed=False on the Django models (see also legacy databases)? That way you could just keep the dynamic migrations and eliminate the (redundant) static migration. But I may overlook something here. 🤷‍♂️

Never mind. I just saw that managed=False is already set on the models. But that makes me wonder even more why Django still need migrations describing the models, which is the "static migration". Is this not what managed=False is for? To tell Django that it should not care about modifications of those models and that we handle them ourselves (which the dynamic migrations already do)? Unfortunately, I am still not sure what to do after reverting the stuff in static_migrations.py 😞.

EDIT: Also tried a ./manage.py makemigrations --merge (after reverting the static_migrations.py file), but I end up with this error:

2024-06-06 13:13:30,358 INFO    procrastinate.blueprints Adding tasks from blueprint {'taskName': None, 'action': 'loading_blueprint', 'namespace': 'builtin', 'tasks': ['builtin:procrastinate.builtin_tasks.remove_old_jobs']}
2024-06-06 13:13:30,359 INFO    procrastinate.blueprints Adding tasks from blueprint {'taskName': None, 'action': 'loading_blueprint', 'namespace': '', 'tasks': ['procrastinate_demos.demo_django.demo.tasks.index_book']}
2024-06-06 13:13:30,359 INFO    procrastinate.blueprints Adding tasks from blueprint {'taskName': None, 'action': 'loading_blueprint', 'namespace': 'bp', 'tasks': ['bp:procrastinate_demos.demo_django.demo.tasks.set_indexed']}
Merging procrastinate
  Branch 0025_add_job_priority
    - Raw SQL operation
  Branch 0025_add_models
    - Create model ProcrastinateEvent
    - Create model ProcrastinateJob
    - Create model ProcrastinatePeriodicDefer

Merging will only work if the operations printed above do not conflict
with each other (working on different fields or models)
Should these migration branches be merged? [y/N] y
Traceback (most recent call last):
  File "/workspace/procrastinate/procrastinate_demos/demo_django/./manage.py", line 27, in <module>
    main()
  File "/workspace/procrastinate/procrastinate_demos/demo_django/./manage.py", line 23, in main
    execute_from_command_line(sys.argv)
  File "/workspace/.pyenv_mirror/poetry/virtualenvs/procrastinate-oiodWofN-py3.12/lib/python3.12/site-packages/django/core/management/__init__.py", line 442, in execute_from_command_line
    utility.execute()
  File "/workspace/.pyenv_mirror/poetry/virtualenvs/procrastinate-oiodWofN-py3.12/lib/python3.12/site-packages/django/core/management/__init__.py", line 436, in execute
    self.fetch_command(subcommand).run_from_argv(self.argv)
  File "/workspace/.pyenv_mirror/poetry/virtualenvs/procrastinate-oiodWofN-py3.12/lib/python3.12/site-packages/django/core/management/base.py", line 412, in run_from_argv
    self.execute(*args, **cmd_options)
  File "/workspace/.pyenv_mirror/poetry/virtualenvs/procrastinate-oiodWofN-py3.12/lib/python3.12/site-packages/django/core/management/base.py", line 458, in execute
    output = self.handle(*args, **options)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/.pyenv_mirror/poetry/virtualenvs/procrastinate-oiodWofN-py3.12/lib/python3.12/site-packages/django/core/management/base.py", line 106, in wrapper
    res = handle_func(*args, **kwargs)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/.pyenv_mirror/poetry/virtualenvs/procrastinate-oiodWofN-py3.12/lib/python3.12/site-packages/django/core/management/commands/makemigrations.py", line 196, in handle
    return self.handle_merge(loader, conflicts)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/.pyenv_mirror/poetry/virtualenvs/procrastinate-oiodWofN-py3.12/lib/python3.12/site-packages/django/core/management/commands/makemigrations.py", line 499, in handle_merge
    with open(writer.path, "w", encoding="utf-8") as fh:
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
FileNotFoundError: [Errno 2] No such file or directory: '<procrastinate migrations virtual path>/0026_merge_0025_add_job_priority_0025_add_models.py'
ewjoachim commented 4 months ago

I'm really sorry for the mess, it's all my fault :)

I'll push some fixes to your branch as soon as I can :)

ewjoachim commented 4 months ago

But that makes me wonder even more why Django still need migrations describing the models, which is the "static migration". Is this not what managed=False is for? To tell Django that it should not care about modifications of those models and that we handle them ourselves (which the dynamic migrations already do)?

Sadly... managed=False tells django that migrations are purely informational, but it still needs to follow the migrations because, as far as I can tell, it needs to understand where each model is coming from and how it evolved, even if it's not django's role to update the schema on the DB side. Kind of like the migrations that track things that don't exist in the DB, like help_text.

medihack commented 4 months ago

Sadly... managed=False tells django that migrations are purely informational, but it still needs to follow the migrations because, as far as I can tell, it needs to understand where each model is coming from and how it evolved, even if it's not django's role to update the schema on the DB side. Kind of like the migrations that track things that don't exist in the DB, like help_text.

Now I get it (at least I think so). Django even generates some pseudo migrations for unmanaged models for some internal state management. Oh, that makes things tricky. How about managing the migrations directly in procrastinate (that would also resolve #1040). It would have its own table where the applied migrations are stored, new migrations are applied in a pre_migrate signal ( or outside Django in a new migration command), and Django migrations are just generated by makemigrations. I'm just thinking loud 😉 and I'm also not sure how backward compatibility can be achieved.

I'm really sorry for the mess, it's all my fault :)

Let's say its Django's fault 😃. They make it quite hard to develop a library that should also be independent from it (don't get me wrong ... I love Django ... it is what I mainly use).

I'll push some fixes to your branch as soon as I can :)

That's very nice of you! Take your time. No hurry on my side.

ewjoachim commented 4 months ago

Ok, I've fixed the issue on main.

After rebasing, you should be able to create a Django migration in procrastinate/contrib/django/migrations looking like this:

from django.db import migrations
from .. import migrations_utils

class Migration(migrations.Migration):
    operations = [migrations_utils.RunProcrastinateSQL(name="02.00.03_01_add_job_priority.sql")]
    name = "0026_add_job_priority"
    dependencies = [("procrastinate", "0025_add_models")]
medihack commented 3 months ago

Ok, I've fixed the issue on main.

After rebasing, you should be able to create a Django migration in procrastinate/contrib/django/migrations looking like this:

from django.db import migrations
from .. import migrations_utils

class Migration(migrations.Migration):
    operations = [migrations_utils.RunProcrastinateSQL(name="02.00.03_01_add_job_priority.sql")]
    name = "0026_add_job_priority"
    dependencies = [("procrastinate", "0025_add_models")]

I guess we still need to put priority into a Django model migration. So, I would create two migration files: one for the SQL and one for the model itself (migrations.AddField). Right? How should we name them? 0026_add_job_priority.py and 0027_add_job_priority_to_model?

ewjoachim commented 3 months ago

Ah the Django AddField may be in the same migration as the other one, it's not a problem. With the new system we can edit migrations manually, so let's create only 1 django migration per PR if we don't actually need to do otherwise.

ewjoachim commented 3 months ago

(I can take over for the backwards compat migration if you prefer)

medihack commented 3 months ago

Congratulations for an excellent first contribution ! It was awesome working you you on this, and I'd love to see you contributing again :)

It was my pleasure, and I also learned something along the way. As promised, I will work on #1072 in the upcoming days.

alexwilson1 commented 3 months ago

Hey team,

Great work with Procrastinate, we love it. However, our latest deployment failed with error psycopg.errors.UndefinedColumn: column "priority" does not exist. This is following a package change from 2.4.0 to 2.7.0 that should be backwards compatible.

Is there an update we should wait for to fix this backwards compatibility or should we perform a manual migration? It seems related to this change.

Thanks!

medihack commented 3 months ago

@alexwilson1 Have you applied the migration files? In your case, it sounds like you have to apply: 02.00.03_01_add_job_priority.sql 02.05.00_01_add_periodic_job_priority.sql 02.06.00_01_add_cancel_states.sql

@ewjoachim It seems that 02.00.03_01_add_job_priority.sql isn't named correctly (it should be 02.04.*, or? Maybe associated with #251). I think this is because there were releases after I started the PR.

ewjoachim commented 3 months ago

@ewjoachim It seems that 02.00.03_01_add_job_priority.sql isn't named correctly (it should be 02.04.*, or? Maybe associated with #251). I think this is because there were releases after I started the PR.

Yep, I made a note of it when I realized that on the release. I didn't want to rename the migration post-release because I was afraid it would cause issues. At least, it doesn't change the order of migrations.

There's a detail that should appear in the doc and doesn't: you can call procrastinate schema --migrations-path to get the path in which migrations are stored in a venv where procrastinate is installed.

ewjoachim commented 3 months ago

If that's ok with you, let's discuss there instead

alexwilson1 commented 3 months ago

@medihack thanks for the instructions. We have applied the migrations mentioned and now it seems to be working fine!