Open acco opened 1 month ago
Overall looks good.
Have the WAL Pipeline Server migrate destination schema on boot if needed. We'd add commit_lsn (and perhaps commit_index if desired) as null: true to the table (we'll keep null: true indefinitely)
Why do sequin_events
tables need either of commit_lsn
or commit_index
columns? Can we not just use a bigserial seq
column? The WAL Pipeline Server, as a single threaded writer, will ensure that seq
is properly ordered for reads.
Basically I think we just have to:
commit_index
to wal_events
and write this as the index of messages in each transaction from Replication
commit_lsn, commit_index
when shuffling events from wal_events
to sequin_events
tablessequin_events
tables by seq
consumer_records
tables by id
(maybe we should consolidate on seq
for consistency; also drop commit_lsn
from this table if it's unused)We need commit_lsn
and commit_index
from replication slot -> wal events -> sequin events for de-duplication in case we replay events from the replication slot
Relatively high priority, as we're "collapsing" events that happen to the same row inside of a transaction, which is undesirable.
Right now, we have an issue upserting to
sequin_events
tables:seq
is set to the commit_lsn. But commit_lsn + record_pk is not unique — a record can be updated multiple times inside of a commit_lsn (a single transaction).We store
commit_lsn
in a few places:wal_events
consumer_records
sequin_events
tablesFor
consumer_records
, it's OK to flatten multiple "touches" of a row into one record (mostly — if the last thing that happens is a delete, we actually want the delete to win).For wal_events and sequin_events, instead of just a
commit_lsn
, what we really want is{commit_lsn, commit_index}
— i.e., a combination of both the commit_lsn as well as the index of the specific event inside the commit. To get that, we'd simply haveReplication
count events received after a transaction is started, tagging each WAL event with its index. The order is(commit_lsn, index) asc
It might be too complicated to do
sort by
everywhere by commit_lsn, index when pulling fromsequin_events
? So we could keepseq
, turning it into an auto-incrementing bigserial.Migration
In terms of migration path for sequin_events, we'd:
commit_lsn
(and perhaps commit_index if desired) asnull: true
to the table (we'll keep null: true indefinitely)seq
to bigserial, have the sequence start at whatevermax(seq)
currently is.Update on schemas
consumer_records
- no unique constraint, commit_lsn unused. We do not have to touch this table.wal_events
- no unique constraint. However, we want events inside the same commit_lsn to be ordered. Right now, the order is lost at this step. So, we want to addcommit_lsn
as well ascommit_idx
(or the idx of this specific event inside the commit_lsn)sequin_events
tables: see above - continue to use a single columnseq
for ordering, but convert to big serial. We don't really need to add commit_idx to their table - not right now. We just need to make sure that wal_events are inserted in the right order (i.e. seqs are correct)