Sync clobbers indexes - Githubissues

aboodman commented 3 years ago

Doh. Replicache sync too good.

One of the invariants of Replicache is that the client always snaps to the server, no matter what. The server doesn't know about indexes, so Replicache drops them after a sync.

Did not test this situation thoroughly because (a) rushing, and (b) was thinking sync was unrelated from indexes.

What's happening is that during sync, we rewind to the last snapshot, which of course doesn't include the new indexes, push the changes to server, server ignores index creation requests, and then we apply a patch which doesn't include anything about indexes. So after sync we end up with no more indexes.

On the plus side everything ends up in a perfectly consistent state and the old indexes are even GC'd! 😂.

aboodman commented 3 years ago

@phritz do you think something like https://github.com/rocicorp/repc/pull/221 makes sense?

phritz commented 3 years ago

@phritz do you think something like #221 makes sense?

It's close but I think more strictly correct would be to take the index definitions for the sync snapshot from the parent of the first replay commit, if any. If no replay commits, from the head of the main chain. Reasoning:

Main chain: S
- the correct thing is to take the index defs from S
Main chain: S - L1 (will not be replayed)
- the correct thing is to take index defs from L1 (if we take from S then any created in L1 get dropped)
Main chain: S - L1 (will be replayed)
- correct thing is to take from S because L1 index changes might not be idempotent (eg, L1 drops an index; if we take sync snapshot indexes from L1 then when it is replayed it will error trying to drop a non-existent database). See note re idempotency below.
Main chain: S - L1 (not replayed) - L2 (replayed) - L3 (replayed)
- can't take from S because index changes in L1 would be lost
- can take from L1; changes in L2 and L3 will be replayed on top
- can't take from L2 because index operations might not be idempotent
- can't take from L3 because when L2 is replayed it assumes the db has state in L1 (eg, L3 might drop an index that L2 uses)

I can hear the reader asking themselves "can we make index operations idempotent"? Pretty sure the answer is yes (creation already is) but it doesn't buy us much. It would give us the option to take the index defs from either the parent of the first commit to be replayed or the first commit to be replayed (we can't take it from later replay commits on the main chain as we see in the last bullet above). We might be able to save a bit of work taking from the first commit to be replayed, but I argue that we should not because it messes with our mental model. Replay commits should be replayed, not partially applied (index changes) and then replayed. It's far more natural for us to take index defs from the parent of the first replay -- it keeps the mental model of replaying commits on top of the most recent confirmed state (from the client view or otherwise).

Note: above refers to where we get the index definitions. Once we have them we still need to rebuild the entire index; we don't have a nice way to re-use old indexes :(

rocicorp / repc

Sync clobbers indexes #220