ipfs / notes

IPFS Collaborative Notebook for Research
MIT License
402 stars 31 forks source link

Diffs and Refs to Previous Versions #310

Open ericvicenti opened 9 years ago

ericvicenti commented 9 years ago

IPFS is awesome. Great work so far @jbenet! But I think it lacks a critical feature. It may be poorly documented, or maybe you are already working on it, so please correct me if I'm wrong.

An object (file or directory) in IPFS should be able to point to a previous version hash. This way you can look at a file or directory and see how it evolved.

This also applies to directories, but lets take a look at an example situation with regard to a file: If I have a really large file and I want to make a minor change, I should be able to make a new object that refers to the original file and provides a small change-set on top of it. This will keep new objects very light, so many can be created. That way we could create an object every time the file gets saved, without too much overhead. This would be known as a diff object.

Even though the data in differential files might be identical to others, the hash will be different, because the history is different.

The side effect of this feature is that one file might take a very long time to read because it could be comprised of many objects which need to be fetched sequentially. To mitigate that, I believe we need refs as well. A ref is a full object, but one that also contains a reference to the previous version of it. This way the previous version does not need to be fetched in order to read the contents.

I believe this is a critical feature for IPFS to ever compete with Git or Hg. What do you think?

whyrusleeping commented 9 years ago

@ericvicenti the functionality you talk about comes for almost free, all files stored in ipfs are chunked, and any changes to the file need only modify the respective chunk, currently we use sized based chunking because its simpler, but this has the side effect of having to store all new objects if you insert bytes into the file, we have an implementation of our chunker than uses rabin fingerprinting to determine chunk boundaries and will solve this issue, its in the works, just not ready yet. As far as directories go, we dont quite have this functionality there yet, but thought has been going towards 'chunking directories' recently, and I am confident that we will be able to get similar features to the file chunking for them soon.

jbenet commented 9 years ago

We'll support commits too. Already planned. May want to read the paper, and watch talks. (And yes, our docs need to be better)

— Sent from Mailbox

On Sat, May 2, 2015 at 1:36 PM, Jeromy Johnson notifications@github.com wrote:

@ericvicenti the functionality you talk about comes for almost free, all files stored in ipfs are chunked, and any changes to the file need only modify the respective chunk, currently we use sized based chunking because its simpler, but this has the side effect of having to store all new objects if you insert bytes into the file, we have an implementation of our chunker than uses rabin fingerprinting to determine chunk boundaries and will solve this issue, its in the works, just not ready yet. As far as directories go, we dont quite have this functionality there yet, but thought has been going towards 'chunking directories' recently, and I am confident that we will be able to get similar features to the file chunking for them soon.

Reply to this email directly or view it on GitHub: https://github.com/ipfs/ipfs/issues/68#issuecomment-98396837