rust-lang / triagebot

Automation/tooling for Rust spaces
https://triage.rust-lang.org
Apache License 2.0
169 stars 75 forks source link

Use a transaction when applying migrations #1822

Closed Kobzol closed 2 days ago

Kobzol commented 2 weeks ago

Recently, migrations in triagebot have been causing some issues. When I checked the code I noticed that transactions weren't used, which caused two issues: 1) The migration command could have succeeded partially. In one PR, the command contained several SQL queries, where the first one has succeeded, and the rest of them didn't (in fact they could not succeed, because multiple statements are not supported in a single call to execute). This would be rolled back if it was wrapped in a transaction. 2) It was possible that a migration has happened, but the ID wasn't updated in the DB, for some reason.

This PR wraps each migration in a separate transaction.

CC @apiraino

apiraino commented 2 days ago

pls can anyone review this? It's blocking me on #1790 and by extension #1753

thanks

ehuss commented 2 days ago

I'm not quite following the exact issue this is trying to resolve.

If a migration fails, then it should exit without updating the counter. What was the behavior with https://github.com/rust-lang/triagebot/pull/1781 with multiple statements? Did it successfully execute the first statement, and then fail?

I'm curious if you considered wrapping the entire for loop (or the entire function) in one transaction instead of doing each migration in a separate transaction. I think it should work either way (a single transaction might be slightly faster?), would just like to know why this particular approach was chosen.

I'd also like to note that the migration code doesn't look like it handles concurrent processes trying to do migrations. Probably not an issue for triagebot, but I wanted to just note it.

Kobzol commented 2 days ago

Did it successfully execute the first statement, and then fail?

We don't really know. Currently, based on the state of the DB, it looks like it executed the second statement, and then it failed..

The goal is to make the execution of the migration and the increment of the counter atomic. It is true that if the migration fails, then the counter should not be updated (even without a transaction). However, adding a transaction fixes the following two problems:

I'm curious if you considered wrapping the entire for loop (or the entire function) in one transaction instead of doing each migration in a separate transaction. I think it should work either way (a single transaction might be slightly faster?), would just like to know why this particular approach was chosen.

I did consider that. I don't think that performance is really important here, usually you just execute 1, maybe 2 migrations, the already applied migrations are not being executed anyway, so there is no permanently increasing performance cost. So I wouldn't personally do it because of performance reasons. My reasoning was that if you have a good migration and then a bad migration, you might want to at least go as far as you can (hence separate migrations). On the other hand, if you apply multiple migrations together, you probably added them together in the same PR, and therefore they are probably related to one another, and therefore if one fails, you might later want to modify all of the newly added migrations in another PR with a fix. So perhaps wrapping everything in one larger transaction makes more sense.

I'd also like to note that the migration code doesn't look like it handles concurrent processes trying to do migrations. Probably not an issue for triagebot, but I wanted to just note it.

You're right. I'm not sure if we deal with this in any of our tools, this is non-trivial to resolve, AFAIK.

ehuss commented 2 days ago

We don't really know. Currently, based on the state of the DB, it looks like it executed the second statement, and then it failed..

What information do you have that makes it seem that way?

From what I understand:

  1. Migration counter is 11.
  2. 1773 Feb 23 added two new migrations.

    • Added table review_prefs
    • Added index review_prefs_user_id
    • Migration counter is now 13.
  3. 1781 modified an existing migration (to add CREATE EXTENSION), and thus the migration did not run. Also, the migration is now invalid since it contains multiple statements.

    • Indeed, if you try to run #1781 on a fresh database, the migration fails, and the migration counter is left at 12.
  4. 1786

    • Added column max_assigned_prs to review_prefs
    • Migration counter is now 14.
  5. 1793 (take 2) tried to fix the migration by splitting the incorrect formatting in #1781. However, this did not work as expected. It just shifted the migrations. The CREATE EXTENSION migration never runs.

    • The ALTER TABLE migration runs again, but does nothing.
    • Migration counter is now 15.
  6. 1796 (revert last two PRs) Reverts #1793.

    • Migration counter is still 15 (unless Mark manually modified it?). It is now out of sync with the source code, which only has 14 migrations. Also, the CREATE EXTENSION migration has never run.

Nowhere along this does it seem like the lack of transactions is the problem.

Kobzol commented 2 days ago

Thank you for the detailed analysis! :heart: The counter was indeed 15, Mark now modified it to 14, and the last migration is in effect (the index exists), but the penultimate one isn't (I asked Mark to enable the intarray extension manually though).

In these very specific circumstances, it probably wouldn't have helped to have transactions, but if the multistatement migration was added as a new migration, instead of being modified (and this can happen again in the future), transactions would prevent the error.

But having transactions for migrations is a good idea nevertheless, regardless if it would prevent issues in this specific case.

ehuss commented 2 days ago

if the multistatement migration was added as a new migration, instead of being modified (and this can happen again in the future), transactions would prevent the error.

Unless I'm misunderstanding, I don't think it would. If you try to run a migration with multiple statements, it would error with ERROR: cannot insert multiple commands into a prepared statement, without running any of the statements. Triagebot would then immediately exit without updating the counter.

Kobzol commented 2 days ago

I see. I thought it would execute the first statement only. Ok, so it wouldn't help in this case. It's still a good idea to do it though.

Kobzol commented 2 days ago

Thanks! I assume you meant "doesn't support in transactions"?

ehuss commented 2 days ago

Correct, I updated the comment. There are around 11 that aren't supported ("create db", "vacuum", etc.).