graphile / migrate

Opinionated SQL-powered productive roll-forward migration tool for PostgreSQL.
MIT License
751 stars 58 forks source link

Race-condition in beforeCurrent hook #185

Open TimoStolz opened 1 year ago

TimoStolz commented 1 year ago

Summary

I use pg_dump to save data from tables, that are currently under development, and sometimes, pg_dump yields no data, sometimes it does, although there was data, before the current migrations were re-deployed.

Steps to reproduce

My hook in beforeCurrent:
(I use -T committed_table to only export data from tables, that will be destroyed and recreated.)

{
  "_": "command",
  "command": "pg_dump \"$GM_DBURL\" --data-only --schema app_public -T app_public.users -T app_public.user_emails -T app_public.user_authentications -T app_public.organizations -T app_public.organization_memberships -T app_public.organization_invitations --column-inserts --no-comments --rows-per-insert=1000 --file migrations/restoreCurrentData.sql"
}

The corresponding hook in afterCurrent

{
  "_": "sql",
  "file": "restoreCurrentData.sql",
  "shadow": false
}

Expected results

restoreCurrentData.sql should always contain the data that was present, before the current migrations were about to re-run.

Actual results

restoreCurrentData.sql is sometimes empty, most often, when I changed multiple files at once. (Even if using save all in VSCode.) It happens less frequent, if I change my files one after another.

Additional context

Our conversation on Discord:

timostolz - at 2023-06-21 08:38:00+00:00 If I have "beforeCurrent" hooks in graphile-migrate, will they run synchronously, and will their end state be awaited, before the current migrations are deployed?

timostolz I use pg_dump to save data from tables, that are currently under development, and sometimes, pg_dump yields no data, sometimes it does, although there was data, before the current migrations were re-deployed. However, if the current migrations and pg_dump run at the same time, a race-condition could explain this problem. Under this assumption, the tables could be dropped before pg_dump begins to export the data, finally leading to an empty result.

Benjie Please file an issue and I’ll look into it. They should but I wonder if watch can trigger twice concurrently

timostolz Ahh, up to now, I thought it happens completely at random. But as you come up with triggering twice: Most often, it happens, if I change two or more files at once. (Although I use the "Save all" command in VSCode to save them nearly at the same time.)

Benjie Yeah it makes sense I’d forget to add a queue there especially given the SQL will queue automatically due to transactions, so without a beforeCurrent I wouldn’t notice the issue Also hard to reproduce because everything is so fast, unless you’re using something like save all