ipfs / notes

IPFS Collaborative Notebook for Research
MIT License
401 stars 30 forks source link

Eventual Consistency #330

Open CMCDragonkai opened 10 years ago

CMCDragonkai commented 10 years ago

How is IPFS planning to deal with consistency issues. I've been looking for a distributed filesystem that supports eventual consistency, since all the other DFSs don't seem to scale across zones and data centers.

jbenet commented 10 years ago

@CMCDragonkai IPFS uses a merkle dag data model, so mutating objects is really creating new immutable objects. The only mutable things are /ipns/... objects, which are basically signed pointers (like git branches, and super cheap) stored on the IPFS DHT. So consistency there is based on DHT propagation of the pointer update.

See 3.5, 3.6 and 3.7 in http://static.benet.ai/t/ipfs.pdf

CMCDragonkai commented 10 years ago

Could you describe your proposed process that would happen if process 1 modified a file A and process 2 also modified file A at the same time. Or there was a network partition in between process 1 and process 2. How do the modifications get merged back together? Are the modifications blocking? I assume not for a P2P system, which means each node must always be available to accept writes.

BTW do you know about the Ocaml Irmin?

Also do you think there will be a kernel implementation? Because FUSE is apparently slower than a kernel module. I'm currently contemplating using ZFS + CephFS (because Gluster blocks).

fwip commented 10 years ago

@cmcdragonkai No merging happens. The original file is available at its unique identifier, mod A has another ID, and mod B a third ID.

CMCDragonkai commented 10 years ago

Will there be any garbage control for really old files? Otherwise rapidly changing files such as a log file would have multiple copies for every mutation, that will fill up disk way too fast.

jbenet commented 10 years ago

@CMCDragonkai yep. depends on what you want to keep. Full version histories are needed for some applications, but not others. For example, many backup systems reduce granularity with time. You can do this sort of thing with indexing data structures (i.e. instead of the complete git-like commit dag). -- oh and should note, ipfs clients will reclaim space from objects that are not pinned locally. (pin = "persist this please", other objects treated like a cache)

CMCDragonkai commented 10 years ago

Ok that's cool. So here's how I understand IPFS.

If there were a 100 nodes in the network. If one node writes a file, that file is then replicated to the other nodes. However the node that wrote the file can immediately read the file, even if that file hasn't been fully replicated to the other nodes.

I think this is called read after write consistency: http://shlomoswidler.com/2009/12/read-after-write-consistency-in-amazon.html So that writes are not blocking on other nodes.

Have you heard of Andrew Filesystem, apparently it had weak consistency to reduce latency.

Also would IPFS and LeoFS coexist? LeoFS is an object based distributed filesystem focused on eventual consistency as well.

tjfwalker commented 10 years ago

@cmcdragonkai I (briefly) googled LionFS to no avail. Link, please.

CMCDragonkai commented 10 years ago

Woops, I meant LeoFS: http://leo-project.net/leofs/

daviddias commented 5 years ago

If one node writes a file, that file is then replicated to the other nodes.

Not quite. IPFS doesn't replicate without users intent. The correct way to describe it is: "Once a user adds a file to a node, any node from the network can access it. In order for other nodes to access it, they replicate it".

For cases where multiple nodes share the burden of replicating files, check the ipfs-cluster project https://github.com/ipfs/ipfs-cluster. For novel and interesting ways to achieve replication, check the CRDTs work by the Dynamic Data & Capabilities Working Group.