make seq/since an extension rather than part of the core spec

mikeal commented 11 years ago

Was walking wolfgang listening to our chat about replication and this idea hit me.

The seq is utterly useless without since which I already said needs to be optional if we want to support databases that aren't explicitly storing a sequence index.

In implementing level-sleep i realized that the base requirements for SLEEP are a little too much. The simplest database possible still has 2 approaches that I can see: one is to store the sequence index with a reference to the data which is primarily indexed by the id, the other is to store the data as the value of the sequence index and write the id index with a reference to the current sequence.

Both approaches will work but one optimizes for a faster sequence index, the other for a more usable database. Both require what feels like a lot of unnecessary work as a basis for supporting SLEEP.

Combine this with the format's inability to support merkle tree representations and you start to wonder why the seq is required in the first place.

If the purpose of the core spec is to describe any database as a collection of entities (not necessarily documents or even keys, the id could be hashes for commits or parts of a tree) then the seq assumes too much about the implementation. Furthermore, the presence of seq should indicate a little something about how the database is implemented and therefor must support a since option since that's the best way to cache and retrieve the changes since a prior sync of the database.

Similarly, a database could not have a sequence index but the documents might have hashes or revisions. Another extension of the spec could expose a revision/md5 but not a seq and also provide options for posting the client's list of current revisions/md5s. This is similar to how the spec could be extended to support storage engines using merkle trees, they would post their current state and a list of entities describing the changes and the tree would trickle down.

Thoughts?

@maxogden @dominictarr @rvagg @hij1nx @juliangruber

dominictarr commented 11 years ago

You said post, are you implicitly considering a http based protocol? just checking (my preference is for duplex protocols)

merkle tree requires multiple back and forth exchanges before arriving at a set to replicate.

Adding it on to this protocol would be a very significant change.

I guess, my question is, what use would SLEEP be without something akin to seq? except in the case where the data is append only, and ID is an implicit seq - such as would be the case for a logs.

mikeal commented 11 years ago

i was trying not to imply http, i think the format of the protocol can be neutral. if you look in this repo i implement the existing semantics over both http and a socket (net or tls).

SLEEP w/o seq/since is just a feed format for serializing a database. think about it like an rss or atom feed, the entities are standardized but not all feed servers support proper cache control headers to make it more efficient. but, at the end of the day, even people expressing things other than news articles can build on rss servers/clients as a format (although that isn't happening so much now that XML is broadly considered a pain in the ass).

your feedback about a duplex connection is why i want you in this kind of discussion. if above the wire format we have standard way to send messages back and forth we can also implement and express them in a an http/tcp neutral way and a database neutral way (although the body of the messages would clearly differ based on the db).

mikeal / SLEEP

make seq/since an extension rather than part of the core spec #3