davepacheco commented 2 years ago

The problem

There are a bunch of related problems in Nexus, where we:

do a sequence of database operations that we want to be atomic, but we do them not in a single transaction (or any other construct) that would ensure atomicity. An example is that when we create a Project, we also create a default VPC, VPC Router, and VPC Subnet. If we crashed partway through, we'd wind up with a Project with no VPC or a VPC with no router, etc. Some of these are tagged with TODO-correctness or TODO-robustness, but I'm not sure they all are.
use Diesel's transaction() method, which does what RFD 192 calls an "interactive transaction". RFD 192 advises against this and we should avoid it if possible. Sometimes we do this because it was too annoying to construct the query we really wanted using Diesel.
do arbitrarily-large queries as part of something else that's supposed to be atomic. I'm really thinking of the part in Nexus::project_delete_vpc where we delete all the firewall rules in one query.

I think this problem would get a lot worse if/when we add limits, like "this Project may have no more than N Instances in it". Most likely we're going to need to store that count in the database and update it atomically with instance creation/deletion. If we wind up doing that for most resources, then a lot of things that are simple INSERTs/UPDATEs today will wind up doing more than one thing.

Towards a solution

I think the best approach here is to have a way to put multiple statements into one database transaction. As alluded to above, we don't want to use an interactive transaction if we can avoid it. Ideally, we'd be able to send a batch of SQL queries to the database in one go and get the responses. PostgreSQL and tokio-postgres support this, but Diesel does not. @smklein suggested that once we've generated the SQL with Diesel, we can handle more of the execution on our own using tokio-postgres's pipelined query support. This seems promising! (There's more complexity than just SQL though -- the bind parameters are a tricky part of this too.)

Even before diving into the plumbing, I think it will be helpful to figure out the interface for this, using a couple of the real-world consumers. For example, at project creation, we're going to want:

insert into Project table
- on conflict of uuid, 500
- on conflict of (organization_id, name), 400-level error
insert into Vpc table
- on conflict of uuid or (project_id, name), 500 error
insert into VpcSubnet table
- on conflict of uuid or (vpc_id, subnet_name), 500 error

etc.

That means this interface needs to be able to tell us which query failed and exactly how it failed.

To make sure we build something that works for more than one consumer, it's also worth looking at a few other use cases like:

imagine what it would look like if we also needed to update a separate table of limits (e.g., update limits set nprojects = nprojects + 1 WHERE organization_id = ..., where that can fail for violating a CHECK constraint).
the VPC deletion one above (i.e., make sure that either this interface can be used in a way that would let us delete all the firewall rules in batches -- which seems impossible -- or we have some other way to solve that problem)

Another thing to work out: how do we build and compose these pieces together? Currently, these steps are each done by separate functions in DataStore that execute the complete query. That obviously needs to change. So what would these functions return? It needs to be something that can be combined with other queries into a bigger, batched query -- and probably also defines error handling for the subquery it's returning?

Alternatives considered

RFD 192 talks about other tools for maintaining strong consistency of the database (including our application-level invariants) . These include:

Structuring queries to avoid needing complex transactions. We're already doing this in some places...and as linked above we've avoided it in some places because it's kind of annoying to write a CTE with Diesel. This doesn't work well when you need to INSERT a bunch of things.
Generation numbers -- not clear how these can be applied to this particular problem.
CTEs -- I think we could use this for the Project create case, but it would be annoying first to write the complex CTE that lets you distinguish all the different error cases you need and then to write the Diesel code to implement it. Still, this is an option.
Sagas. These work, but they're overkill for something that could just be in a single batched database transaction. Sagas have an overhead of at least two database writes per saga plus at least two writes per action. As an example, if we were to put project creation into a Saga, we'd be increasing the number of database writes for this operation by 3x-4x. Sagas also don't necessarily have the behavior we want: if Nexus crashes halfway through Project creation, it's not clear that we want that Project creation to be resumed and then carried out. A consumer might find that very confusing! Their initial request will end unexpectedly with an ambiguous response, and they might fetch the Project, it might not be there (because the saga hasn't been recovered yet), and then they retry it themselves and suddenly it's there already. This seems to violate strong consistency.

smklein commented 2 years ago

Some follow-ups:

diesel-async supports pipelined requests: https://docs.rs/diesel-async/0.1.0/diesel_async/struct.AsyncPgConnection.html
With https://github.com/oxidecomputer/omicron/pull/1733 , there is better support for adding CTEs

Both of these are tools which can be used to reduce the transaction count in Omicron

davepacheco commented 2 years ago

diesel-async looks promising here! It's great that it lets you get the results of the individual queries, too (unlike the old batched_query). The big question for me is whether you can have one query be BEGIN, then do stuff, then COMMIT, and have that do what we want? It seems like that should work at least in the happy path because I think in terms of the PostgreSQL wire protocol the transaction behavior is basically layered atop a sequence of queries (i.e., you really do just send BEGIN, and once that's processed you're in a transaction, and once you send COMMIT or ROLLBACK then you're not any more).

The docs imply that the queries should be independent. I think they mean "even if query B is issued after query A, the construction of query B does not depend on data returned from query A". Behaviorally, they're not independent if there's a BEGIN/COMMIT in there. They're not even independent if A is INSERT INTO Foo ... and B is SELECT * FROM Foo. I suspect this mechanism is intended to support this kind of thing though.

For BEGIN/COMMIT to work in the happy path, the implementation would need to guarantee that these are definitely sent in some well-defined sequence -- presumably the sequence you construct them? Or is the sequence that you poll the futures? But either way it must not be allowed to reorder them.

It also needs to guarantee they're being sent on the same PostgreSQL connection, but I assume that's true because this is part of AsyncPgConnection.

Lastly, it would need to be guaranteed that any transaction you start with BEGIN will be cleanly aborted (and the connection left in a usable, no-transaction-running state) if either one of the queries fails or if the Rust code unwinds. I believe the database takes care of the first case: if a query fails that causes the transaction to abort, all subsequent queries fail immediately with a "transaction already aborted" error until you explicitly try to COMMIT or ROLLBACK. I believe the connection pool is responsible for the second case -- it should probably always issue a command to abort any in-progress transaction.

In summary: this looks promising. I think it's worth checking with the developers to see if it's intended that this use case would work and pointing out those things they'd have to guarantee (and maybe suggesting some doc updates). I'm happy to take a look at this.

davepacheco commented 2 years ago

Started a discussion on diesel_async about this here: https://github.com/weiznich/diesel_async/discussions/25

From @steveklabnik, maybe relevant about using diesel_async: https://github.com/weiznich/diesel_async/discussions/5

oxidecomputer / omicron

crash-correctness issues related to not using transactions well #973

The problem

Towards a solution

Alternatives considered