y-crdt / y-octo

CRDT implementation which is compatible with https://github.com/yjs/yjs
https://octobase.pro/
Other
187 stars 5 forks source link

add binary format that supports partial reading and self-verification as a storage format #8

Open darkskygit opened 1 year ago

darkskygit commented 1 year ago

ybinary v1 is a binary format optimized for one-time network transmission.

It only supports overall reading and cannot know whether binary is damaged before the reading process goes wrong.

For specific analysis, please refer to this review:

https://github.com/toeverything/OctoBase/issues/383#issuecomment-1513577058

We need to design a binary format that supports partial reading and self-verification to store crdt state permanently and robustly

Brooooooklyn commented 1 year ago

From the advice from @dmonad, we can store the checksum info in the y-binary itself.

dmonad commented 1 year ago

You can create a new (custom) binary "v1-with-checksum" by concatenating the checksum and the binary update. E.g.

doc.on('update', update => {
  const v1UpdateWithChecksum = encoding.encode(encoder => {
     encoding.writeUint8(encoder, ChecksumType)
     encoding.writeVarUint8Array(encoder, checksum(update))
     encoding.writeVarUint8Array(encoder, update)
  })
})

I imagine that most users don't want to verify each single update and re-request the data from another source if the update is manipulated. So maybe you store an error-correcting CRC checksum instead of something like sha or rabin.