Open jackie-scholl opened 7 years ago
A couple of updates:
Hi @raptortech-js ! I was reading this and I was also thinking it is an awesome idea for ipfs-cluster
(which I am working on), and that it fits well in the model! Right now we're just trying to have it do the basic thing (multipin), reliably, but it is clear in the future it will need to incorporate more elaborate strategies which allow the use of storage more efficiently.
How does that coordination service work? We're asking it to keep a lot of data. Would it end up having to be centralized?
The current approach is to keep a distributed op-log which builds the central state on different nodes, which at least means that coordination tasks could be taken care of by any other node if the leader goes down. However scaling needs dictate that this coordination must be kept simple and with low overhead and that anything that can be distributed should be. For a system like this it probably requires thinking very well how much work can be outsourced from the leader. And there's still a few challenges involved in such a proposal, like reliably detecting that a file is eligible for cold storage across the whole cluster (need to keep a hit counter for every node and add them up? should all requests travel through a centralized master?), or figure out if it is not painfully slow to recompose the original files (for a start, right now IPFS cannot pin two things at the same time, but I'm hoping this changes with the new storage system) etc.
In any case, thank you for this idea, I'm taking it into account to design a flexible architecture that can support things like this (as plugins ideally) in the future without much hassle. I think it would be also good if you opened an issue in ipfs-cluster referencing this one, where we can more concretely discuss how an ipfs-cluster implementation would look like and what particular problems would need to be solved first.
Consider this scenario:
You are some entity with a whole lot of data. You're using IPFS to store and retrieve that data for your servers (either public cloud or on-prem). You're probably using an (as-yet-unimplemented) private IPFS network to keep your files totally safe. To ensure resiliency, you always make sure that each file you care about is stored on IPFS nodes in at least three geographically isolated locations, out of the 10 locations that you have storage servers in. That means that for every GB of storage you need, you need 3 GB of disk space. But when you look over to your peers, they manage to store more data with less disk and higher resiliency using Reed-Solomon erasure coding! In fact, your friends at Facebook are using 10-of-14 encoding, which allows them to survive the failure of any 4 nodes out of the 14 storing their data while only using 1.4 GB of disk per GB of storage. But you're using IPFS, so you can't achieve that efficiency. Or can you?
A Possible Architecture for Reed-Solomon on IPFS
(Note: I use the term "availability zone" here to indicate groups of servers that are likely to fail together. I assume failures between availability zones are independent.)
I present here an idea for how one might achieve the efficiency of Reed-Solomon storage and also preserve many of the benefits of IPFS.
There are four components to this architecture:
Here's how the basic flow works:
One of the amazing things is that I believe this could be done without touching the IPFS client code at all. Obviously the coordination service, storage node, and encoding node are all using IPFS in pretty mundane ways, but because of the amazing property that IPFS can be mounted as a filesystem and also stored files are just a directory a the filesystem, I think one could implement the decoder with a program that only touches the filesystem and perhaps doesn't even "know" IPFS. The way you would do this would be to implement a FUSE filesystem that pretends it has each of the files that the decoder node should be broadcasting. You then tell the IPFS client that files should be stored in some directory on this filesystem, and it will detect all the files the decoder is pretending to have. When a request for some file comes in through IPFS for some object, the IPFS client will turn around and ask the FUSE filesystem for that object. The decoder will, in turn, look at the directory it has stored and ask the IPFS mount for each of the chunks. The IPFS mount will ask the IPFS client, which will go and fetch them (from the storage servers) and pass them on to the mount which will pass them up to the decoder which will wait until it has enough (7 in this example) and recombine them and pass the resulting file (which will be exactly the source file) up to the IPFS client which will pass them on to the original requester. So, yeah, a bit complicated and a bit slow, but I think it would work!
Some advantages to this architecture:
Some disadvantages:
(P.S. I'm interested in any comments/questions/suggestions/whatever that anyone might have!) (P.P.S. I wasn't sure if this was the right place to post this? I'd be happy to move it if that'd be helpful.)