Open mikeal opened 11 years ago
You said post
, are you implicitly considering a http based protocol?
just checking (my preference is for duplex protocols)
merkle tree requires multiple back and forth exchanges before arriving at a set to replicate.
Adding it on to this protocol would be a very significant change.
I guess, my question is, what use would SLEEP be without something akin to seq? except in the case where the data is append only, and ID is an implicit seq - such as would be the case for a logs.
i was trying not to imply http, i think the format of the protocol can be neutral. if you look in this repo i implement the existing semantics over both http and a socket (net or tls).
SLEEP w/o seq/since is just a feed format for serializing a database. think about it like an rss or atom feed, the entities are standardized but not all feed servers support proper cache control headers to make it more efficient. but, at the end of the day, even people expressing things other than news articles can build on rss servers/clients as a format (although that isn't happening so much now that XML is broadly considered a pain in the ass).
your feedback about a duplex connection is why i want you in this kind of discussion. if above the wire format we have standard way to send messages back and forth we can also implement and express them in a an http/tcp neutral way and a database neutral way (although the body of the messages would clearly differ based on the db).
Was walking wolfgang listening to our chat about replication and this idea hit me.
The
seq
is utterly useless withoutsince
which I already said needs to be optional if we want to support databases that aren't explicitly storing a sequence index.In implementing
level-sleep
i realized that the base requirements for SLEEP are a little too much. The simplest database possible still has 2 approaches that I can see: one is to store the sequence index with a reference to thedata
which is primarily indexed by theid
, the other is to store thedata
as the value of the sequence index and write theid
index with a reference to the current sequence.Both approaches will work but one optimizes for a faster sequence index, the other for a more usable database. Both require what feels like a lot of unnecessary work as a basis for supporting SLEEP.
Combine this with the format's inability to support merkle tree representations and you start to wonder why the
seq
is required in the first place.If the purpose of the core spec is to describe any database as a collection of entities (not necessarily documents or even keys, the
id
could be hashes for commits or parts of a tree) then theseq
assumes too much about the implementation. Furthermore, the presence ofseq
should indicate a little something about how the database is implemented and therefor must support asince
option since that's the best way to cache and retrieve the changes since a prior sync of the database.Similarly, a database could not have a sequence index but the documents might have hashes or revisions. Another extension of the spec could expose a revision/md5 but not a
seq
and also provide options for posting the client's list of current revisions/md5s. This is similar to how the spec could be extended to support storage engines using merkle trees, they would post their current state and a list of entities describing the changes and the tree would trickle down.Thoughts?
@maxogden @dominictarr @rvagg @hij1nx @juliangruber