raimohanska / ourboard

An online whiteboard
Other
763 stars 58 forks source link

Smarter persistence model #114

Closed raimohanska closed 3 years ago

raimohanska commented 3 years ago

Currently the full board history is rewritten to the db in JSON for each change (throttled on the server though). It would be very desirable to be able to just append the newest events to the DB.

Currently the single JSON blob object is used to save rows on the DB, as Heroku psql is row limited :)

raimohanska commented 3 years ago

I've been thinking about the smarter persitence model. Currently full board history is kept in memory for active boards and written to the db on all changes. Not sustainable, lots of mem usage and outrageous amount of IO towards DB (all events are sent over and over again to DB).

  1. We don't need the full event history in server mem.
  2. For new connecting clients it should suffice to send the current board snapshot, which is convenient to have in-mem. (should test quickly how many active "typical" boards we can have in mem)
  3. For re-connecting clients we need to send events since their last sync. This can be server from mem if we keep part of the history in mem or from DB. I propose the DB, beefed by some in-mem cache maybe.
  4. I think we could buffer in coming events on the server for some time (say, 30 seconds) and then write them to the DB as a new "event history bundle" row. There would be no UPDATES, only INSERTS. The bundle would have { last_serial: integer, events: Event[] as JSONB }. This way the most voluminous traffic (inbound events) can be flushed to DB in a rather economical way.
  5. The event_history table containing event bundles can be compacted in a Garbage Collector fashion, by loading a boards history, only retaining last N events, initializing with a Snapshot event and writing as a single row. Further if the history could be ditched to S3, away from the active database. After all, that's essentially immutable data now.
  6. Now re-connecting clients can be served with SELECT * FROM event_history WHERE last_serial > $1
  7. When a board is initialized in the server mem (becomes active) it would fetch all events_history for the table and reconstruct board state into memory.
  8. The above can be optimized by storing this snapshot of the board state along with serial so that next time you need to fetch the snapshot and only those events from event_history where last_serial > snapshot.serial.
raimohanska commented 3 years ago

Well there it is: d3a128bd9c0a86124bd8adff945901c46d891c05

What's still missing: