rocicorp / repc

The canonical Replicache client, implemented in Rust.
Other
31 stars 7 forks source link

Verify that we aren't accidentally serializing all transactions 😬 #240

Closed aboodman closed 3 years ago

aboodman commented 3 years ago

https://github.com/rocicorp/repc/pull/203/files/0c7353a5f4f98d30049afd3af5d4c335f14e03f2..c27b20550acf043e9b1dd14ebc3754613eba0f8f#diff-538fa79214eb88d59c2af080b7f78d197b0c40f287f2b4b0f0114e4cbb91b9f7

phritz commented 3 years ago

we are not serializing all transactions

we had a test that verified sequential execution for write transactions, which i improved and it still passes. we did not have a test for parallel execution for read transactions so I added one and it passes with no code changes. both of these in #270 . so looks like we are good on this front.

there are however a few things i want to understand better before i close this issue out, including at least three levels of locking in connection.rs. will post what i find here when i pick it back up next week.

phritz commented 3 years ago

Before closing this issue out I wanted to rationalize the locking in connection. I read the code pretty closely and here's what I think I have learned. In the below by "operation" or "op" I mean an rpc called within a transaction, eg put, get, has, etc.

Desired properties:

In connection there are three levels of locking in play. The transaction map looks like this:

enum Transaction<'a> {
    Read(db::OwnedRead<'a>),
    Write(db::Write<'a>),
}
TxnMap<'a> = RwLock<HashMap<u32, RwLock<Transaction<'a>>>>;

The lets call the outer lock around the hashmap the "map lock". Then we have "transaction locks" around individual Transactions in the map. And a "db lock" sits inside each Transaction.

The things I wanted to understand are whether all these locks are necessary (i think: yes), what desired properties listed above that each provides (see below), and whether there are any unintended or less than ideal consequences from how they work (maybe).

Simplifying the code so we can see interplay of locking, here's how it works:

let map: RwLock<HashMap<u32, RwLock<Transaction>>>;

fn open_transaction(r: request) {
  let tx = if is_read_tx(r) {
    let db_read = store.read();
    Transaction::new(db_read)
  } else if is_write_tx(r) {
    let db_write = store.write();
    Transaction::new(db_write)
  }
  let map_write = map.write();
  map_write.insert(tx_id, RwLock::new(tx));
}

The db lock used in open_transaction gives us desired property A (write tx is exclusive of all others). We have discussed this before, but given that these mutexes are not fair, a write could wait arbitrarily long to start as long as read txs keep coming in. That is, if a read tx is open when we try to open a write tx the write waits. But more reads could continue get opened; the write just keeps waiting until there is nothing open. Call this potential issue 1. Further, if two write txs come in while a read is open, either one might start first when reads are finished. Call this potential issue 2.

Read and write ops work like this:

fn read_op(r: request) {
  let map_read = map.read();
  let tx = map_read.get(r.tx_id);
  let tx_read = tx.read();
  do_work(tx_read);
  // note: map_read and tx_read held through do_work and dropped here
}

fn write_op(r: request) {
  let map_read = map.read();
  let tx = map_read.get(r.tx_id);
  let tx_write = tx.write();
  do_work(tx_write);
  // note: map_read and tx_write held through do_work and dropped here
}

In the above the transaction locks (tx_read, tx_write) give us desired properties B and C (within a transaction write ops are exclusive, read ops can execute concurrently).

The map lock hasn't so far come into play with our desired properties, so far it is just there to make concurrent access to the transaction map safe. There is an unfortunate interplay though: notice in read_op and write_op that the map read lock is held through doing the work. Suppose there are open read txs. Then a new read transaction cannot be opened until there are exactly zero read ops running. This is because open_transaction needs a write lock on the map, but read_op holds a read lock until it returns, so lots of overlapping read ops prevent a new read transaction from opening even though strictly speaking it should be able to. Call this potential issue 3.

Closing transactions works like this:

fn do_commit(r: request) {
  let map_write = map.write();
  let tx = map_write.remove(r.txn_id);
  // Note: transaction lock not held
  tx.commit();
}

fn do_close(r: request) {
  let map_write = map.write();
  let tx = map_write.remove(r.txn_id);
  // Note: transaction lock not held
}

Note that we do not hold the transaction lock when drop the transaction at the end of each function. In theory there could outstanding read or write ops and we are destroying the transaction they are executing within before they are finished. In practice there are not: do_close and do_commit acquire the map write lock before proceeding and read_op and write_op hold a map read lock through their entire execution, which prevents do_close/do_commit from starting until they are done. Nothing can be borrowed from the map when we get the map write lock. However if we changed the lifetime of the map read lock in read_op/write_op to be more shortlived say to address issue 3 above, then this could maybe lead to problems. Seems like to be safe we should acquire a transaction write lock here to ensure all other ops are complete. Call this potential issue 4.

So...

Ok, what now. I think with respect to the things learned above:

Anyone think I am reading this wrong, or think we should do something other than above?

phritz commented 3 years ago

Additional thoughts:

arv commented 3 years ago

The logic has been moved to JS and all theses issues are resolved as far as a I can tell