procrastinate-org / procrastinate

PostgreSQL-based Task Queue for Python
https://procrastinate.readthedocs.io/
MIT License
834 stars 52 forks source link

Unclear on how to proceed with future migrations #1040

Open stratosgear opened 4 months ago

stratosgear commented 4 months ago

I tried to understand how the project handles schema migrations but after reading all the documentation pages regarding migrations and browsing through related existing open/closed issues, I have not found a concrete explanation of how it works.... :(

In my use case I have introduced procrastinate in an existing code base (non Django based).

I have executed: procrastinate -a my.src.app schema --apply that correctly applied the procrastinate structures required. Procrastinate seems to be working fine.

My concern now is to how can I remain current with all potential migrations that might be coming along in the future.

I was hoping that I would be able to keep executing procrastinate -a my.src.app schema --apply everytime I update my python dependencies and during the project startup, and hopefully automatically catch any potential future migrations required, but I am not sure if this will actually work.

Am I right in thinking that I have to somehow have to adopt any new procrastinate migration scripts as my own and find a way to apply them myself with my existing migration methodology (basically using Alembic)?

Because this is something really fragile and will require a lot of coordinated work to implement, test and maintain. This will increase the friction of adopting procrastinate too much! :(

Am I missing an obvious solution?

ewjoachim commented 4 months ago

I think you're right, except if you use Django.

When we worked on the migration system, we wanted to keep it a bit minimal to avoid tying to a specific system, since there were multiple existing system and choosing one would likely have made it very difficult for people using another one. Since migration systems usually come with their own way of tracking migrations it would be complicated to add our custom way of tracking what has been run or not.

The one may thing we commit to doing is that each release lists its migrations. Each migration script is written so as to be runnable as-is, if you need to modify it it's probably a bug, and most migration systems accept migrations where you give the SQL code to run directly, so they should be compatible. Also, we've made a dedicated Django integration that lists procrastinate migrations as Django migrations.

But then you're right, nothing yet has been made to ease that part of the lib.

We could do the same with Alembic as we do with Django, I guess, it would probably cover most of what people would be using. I've never played with alembic so I'd love if someone would like to have a look.

stratosgear commented 4 months ago

Although I have not done the full analysis, why Procrastinate cannot handle it's own migrations?

For example Alembic, keeps a table where it notes what was the last migration script that was applied.

This process, or something similar, could be maintained internally from Procrastinate, and NOT be the responsibility of the user to take care of an external dependency. Procrastinate already provides a schema manager, in the form of the original cli that applies the schema, so why not extend it a bit and whenever it runs, it checks for any missing migrations, and apply them, otherwise gracefully exit mentioning that everything is up to date.

I mean maybe I oversimplify things, but it seems Procrastinate already deals with much heavier concepts here, auto-handling the migrations should be peanuts! :)

ewjoachim commented 4 months ago

We could, but... I don't like doing in one lib things that [I feel] might get quite complicated and is something complex enough that I would imagine there would be dedicated other libs to do it right.

I'm not saying we can't do it here, but if we did, we'd need:

It's perfectly doable. But it's not trivial. I'm not sure most people want multiple migration systems to cooperate (potentially on the same database) and I'm pretty sure sys admins are not going to be happy when they need to run 2 different migration commands upon deployment. When possible, I really think that if you already have a migration system for your app, you'd rather have Procrastinate use that. At least, I'd want that.

stratosgear commented 4 months ago

Well, I am sorry to say but deciding to not deal with any of these, you are pushing the burden to someone else, not familiar with your codebase, to take on additional responsibilities in order to maintain it. We do not feel it is appropriate to separately deal with each third party utility/extension/plugin that considers it's too much work maintaining its own execution environment. And to be totally clear, by no means you are obligated to do so. It's just that Procrastinate does not fit our needs, in which case, no hard feelings! :)

I think the issue can be closed, since it has verified my initial concern!

Thanks!

ewjoachim commented 4 months ago

Sorry :) Maybe I'll reconsider at some point. I understand your point, but this is a one-person volunteer lib until more people step in, and not my only open-source commitment, so I need to be realistic on what I can/want to work on.

I think it's worth keeping it open if other people want to chime-in. Your point is valid, and even if you chose another lib, it's always worth listening to feedback.

(If someone is interested to contribute, please discuss it first)

medihack commented 1 month ago

I wonder what possible ways to improve the situation here would be. Maybe an additional table where every applied migration is captured. Then, in the first step, at least a developer using Procrastinate could check (with some command) which migrations were applied. The schema.sql file would be unnecessary then, as the migrations have to be applied (in the correct order) by some script initially. Then, in the next step, a script that will automatically apply later migrations when updating Procrastinate. But somehow, this should only affect non Django users.

EDIT: @ewjoachim I just read you had the same ideas.

Another option I can think of (I mentioned it somewhere else) is to always use a custom migration management and only apply those Django specific model migrations in Django. Then we could hook into the Django migration system using a signal (pre_migrate or post_migrate) and execute our own migration system. Not sure how backward-compatible this would be.

ewjoachim commented 1 month ago

I wonder how much of the community doesn't use either Django nor Alembic. Would it be acceptable to provide Alembic migrations alongside with Django and it would be enough for the vast majority of users ?

Otherwise: maybe we could integrate a standalone migration system, such as alembic which is tied to sqlalchemy but could be used independently, or yoyo or any other stadnalone migration system, within procrastinate as an optional dependency.

(To be super extra duper clear: I used to be the maintainer of Septentrion (yet another migration tool) and what I've learned is that it's enough of a complex thing to do to deserve its own lib and not be something we want to do in our own codebase.)

I'm perfectly ok revisiting the decision of letting user deal with it, but I think I really don't want to maintain our own solution.

ewjoachim commented 1 month ago

In our own tests, we use migra. As you can see I have to do all sort of shenanigans when importing it because it seems unmaintained, and also based on schemainspect which seems equally unmaintained. It's the opportunity to remove the dep.

medihack commented 1 month ago

Yes, this makes sense. And I can least estimate how many non Django users are using Procrastinate. As I am a Django user myself the priority regarding this issue is not very high, but maybe it's still good to evolve a plan that somebody else can easily hop in to improve the situation (otherwise it looks more like a better not touch issue 😉).

slifty commented 1 month ago

non-django user here (though I am using alembic)!

I'd very much value a way to avoid having to build my own migration management tool in order to use procrastinate safely in a CI / CD based production environment. That said: Alembic support would absolutely suit my needs. It seems to me that this would provide a solution for non-django users.

If there is maintainer comfort with this direction, I'd be glad to take a stab at implementing support for this, but would value any opinions on approach!

ewjoachim commented 1 month ago

Nice :)

I'm going to push my luck: would you be interested in developing it? Of course, we'll do our best to support you!

slifty commented 1 month ago

Yes! I expect the best approach would be a draft PR that lays out an initial implementation that you can give feedback to.

Stay tuned...

ewjoachim commented 1 month ago

You'll probably be interested to look how Django migrations are done.

3 steps: