Document data versioning and updates on IPFS

ipfs / in-web-browsers

Tracking the endeavor towards getting web browsers to natively support IPFS and content-addressing

MIT License

349 stars 29 forks source link

Reference Notes

Scattered notes from a conversation between @flyingzumwalt and @jbenet

Versioning is surprisingly tricky, mainly because you need different versioning models to suit different uses.

Why we delayed this work: We were waiting until we got IPLD transformations right (related to the Solifying IPLD Sprint)

Factors to consider

fast access to pieces

retrievability of registries

consistency

security & authenticity

A normal key-value system just requires that I be able to get the info to you somehow, but challenge is doing that over a distributed network (see pubsub, etc). If you want high consistency on these names (key-value pairs), you need a secure communication channel for announcements. Most extreme version of that is ethereum. Least reliable (by design) is gossip protocol.

SLEEP uses one versioning model that suits some use cases. It's git-style, but not a straight implementation of the git model

current implementation is not actually secure -- security guarantees for doing things like registries are not sound

git versioning model is another viable model

CRDTs work is more interesting for a lot of use cases, especially anything involving dynamic concurrent updates to a dataset/"database"

One way to provide high consistency guarantee: ipns name updater on ethereum (or general name aggregator chain)

another way to think of this: using ethereum as a libp2p record store -- this warrants clarification
provides high consistency guarantee & retrievable registries
would address concerns about relying on IPNS exclusively over DHT, which allows for situations where you won't be able to find the records/registry depending on condition of the network.
Question: If you're writing the registries to ethereum, why use IPNS at all?
- Answer: IPNS gives you a consistent way to do naming in ipfs-land, ethereum/dns/etc gives you different consistency guarantees

Moving here from #10:

@gozala

Some of the ideas we (me and Patryk) being exploring seem to assume that there is a way to see an every version of the IPFS content. In other words it would be nice to have a changelog for IPNS up2017-02-01s so it’s not just here is what the current version is, but also here all the previous version that existed. I remember mention of the commit objects in white paper so I assume there is a way to up2017-02-01 IPNS pointer with a commit object, but I can’t really figure out how or if I’m actually getting it right. I think something along the lines of http://docs.datproject.org/sleep is what I’m looking for.

@jbenet:

Yes, we have given this a lot of thought, and are returning to it this and next quarter. it's not easy to get this right because what we choose can block many applications. Meaning that "one fully-contained versioning strategy" works for 20% of use cases we've looked at at most. One clear example is that data-center applications that expect to mutate names on the order of <1ms will want something that works a bit differently than apps that require much stronger security (eg censorship resistance that requires timestamp to the bitcoin blockchain, DNS, or some equivalent level of security) but can tolerate only changing names on the order of <100s (like most DNS names).

This actually decomposes to two different problems:

How apps want to do versioning (security, consistency, and dev UX, implications):

Commit graphs (like git)

Commutable patches (like darcs)

CRDTs (riak, orbit, google docs, google internals, the future)

consensus (blockchains, etc).

How apps want to do naming (security, consistency, dev UX, and ownership implications):

slow public key (ipns, sfs) with consensus (strong consistency, >10s up2017-02-01s, available only in some consensus model)

fast public key (ipns, sfs) without consensus (weaker consistency, >1us up2017-02-01s, available disconnected networks -- dhts, pubsub, etc)

DNS naming (strong consistency, >60s up2017-02-01s)

blockchain naming (ENS, blockstack, etc).

In our research, lots of apps DO NOT want to manage their versions manually, want convergent replication, and should be using CRDTs and things like orbit-db. Some subset DO want direct control over versioning and want commit graphs (like git, dat, etc), so for those we will expose direct versioning logs that can be indexed in a couple of good ways. (eg binomial heaps, etc)

BTW, i think the big hump that we need to communicate better is the transition from "apps store data in files" to "apps store data directly, can build files out of data", and the IPFS name isn't helping a ton here.

I'm interested in following the versioning discussion. Below are the kind of versioning things that one could want to do with OpenBazaar. Each OpenBazaar vendor publishes a Unixfs root folder which holds the current public store assets, and different files have different semantics (e.g. some files represent listings, one file represents the profile, one file represents the listing catalogue). Any edit to the store publishes a new IPNS entry with a new root hash.

1) Versioning of stores at the IPNS level. Stores generally have a single owner/committer and I expect stores to mostly move forward linearly (with the occasional roll-back, and branching out). As I see it, we simply need every IPNS entry to point to the root hash of the previous IPNS entry (in addition to incrementing the IPNS sequence). OpenBazaar "archival nodes" (e.g. one run by Duo) could show past versions of stores, in a linear fashion similar to archive.org (most likely), or in a tree-like fashion similar to GitHub (may not be necessary). 2) Versioning of individual documents, notably listings. Each OpenBazaar listing is editable (e.g. to update the price, add a tag, edit the description), but having a cryptographic history for each listing is valuable. For example, we want to be able to aggregate ratings for a given listing over its lifetime, not on a per-edit basis. Having listings contain the hash of the previous listing version at the application level is easily done, but having a standard for versioning at the Unixfs level may be preferable to a home-rolled solution.

ipfs / in-web-browsers

Document data versioning and updates on IPFS #27

Reference Notes