splitgraph / seafowl

Analytical database for data-driven Web applications 🪶
https://seafowl.io
Apache License 2.0
432 stars 12 forks source link

Use a catalog DB-level transaction when running a Seafowl statement #131

Open mildbyte opened 2 years ago

mildbyte commented 2 years ago

Alluded to in https://github.com/splitgraph/seafowl/issues/48.

Start a transaction before planning a batch of Seafowl statements, roll it back on error and commit on success (before returning a result): https://docs.rs/sqlx/latest/sqlx/struct.Transaction.html . Useful for:

gruuya commented 2 years ago

For the sake of documenting offline discussion:

the approaches I see are:

  1. Given that we essentially share the same context object between all invocations of all endpoints, we could/should open a transaction in e.g. plan_query function for any DDL/DML query, while using the basic PG/SQLite pool as the executor as we do now for the rest. The PITA here is that we should then propagate the executor (tx or pool) down into all invocations of catalog function calls and any intermediary functions (and ultimately down to repository function calls). This seems like an overly verbose solution to me.
  2. On the other hand, seeing that we basically only need transactions for POST endpoints, we could create a separate context for each invocation of those endpoints, where the repository executor is set to a newly created transaction for each invocation. The main issue here (any resulting latency effect of this should not matter much given that it doesn't affect the cached reads perf) is that we pass the context along to endpoints as Arc, meaning we can't really create a new separate object from it. So we'd need to either pass DefaultSeafowlContext (as built by build_context), or alternatively the config itself, so we can build one for each invocation, which is also kind of meh.

We've agreed on going with #2, but there one of the main obstacles was making our repo code generic enough for both SQLite and PG, as well as support the transactions, which are supposed to be passed via re-borrowing.

Furthermore, our Repository trait is Sync, but SQLiteConnection is not, meaning that we won't be able to use SQLite transactions in our Catalog/Repo as is.

So for the time being (given that this is not a critical issue) I'm putting this on the backlog, until some of the circumstances change.