Open hannahhoward opened 3 years ago
it would be great to get thoughts from @aschmahmann and @Stebalien on this, since it'll leave visible traces through a lot of the places in IPFS.
Will try and think about this a bit more, but here are some rough thoughts.
How would I add a whole dag and specify provide only a root?
The way I'm thinking this should work is that decisions on what to provide, how frequently, etc. should be able to live in a totally separate system of arbitrary complexity. To handle this efficiently the providing system somehow needs to be able to get triggered when new data is added to the system and given some context as to why it was added. Whether this trigger is the responsibility of the writer implementation or the thing that calls the writer seems debatable (especially since the caller could always do WriterWrapper(writer).Write(data)
), I'm not sure which approach I think is best at the moment.
Assuming eventually IPLD prime finishes implementation of FocusTransform and WalkTransforming, how should I be able to access these interfaces? Important question: should I be able to perform a transform (which potentially digs deep into the DAG) on a DAG I haven't fetched entirely locally? I think maybe I should be able to do this.
Hard for me to say since I haven't really played with these interfaces much. However, I'm pretty sure that our interfaces should support operating on local and remote graphs. We should be able to operate on local-only data if we want to, but I don't think we should be boxed into that at all.
are reads and writes seperate enough that we can make them seperate interfaces?
I suspect so. For example, the Go standard library's io package has a whole slew of small interfaces (read, write, close, seek, etc.) and then combines them into higher level interfaces (readwritecloser, etc.). If someone only needs a reader for their application, but all of our code takes read+write interfaces then they'll just have to implement a write interface that does nothing and/or errors.
The way I'm thinking this should work is that decisions on what to provide, how frequently, etc. should be able to live in a totally separate system of arbitrary complexity.
+1, my vote and bets are here too. I don't know what this looks like. But I think there may end up being more than one system of decisions for providing, and different approaches might not even have a particularly uniform interface, so it would be good if they can be decorated over other core concepts rather than need to be tightly bound together.
To handle this efficiently the providing system needs to be [...] given some context as to why it was added
Yeah, I think the real knot at the heart of the matter is here. That context and "why", if we could ferret out exactly what form that information takes, would probably make a lot of the rest of the design decisions unfold more clearly almost immediately. (But, again, I bet there might be more than one possible form to this information, per different approaches.)
This issue is designed to provoke discussion around design issues related local vs remote storag as it relates to efforts to write new interfaces for working with DAG structures in go-ipld-prime. It's mostly a documentation for context for the team working on this, but I think laying out these concepts may help others in the future in thinking about designing abstractions for IPFS.
Overview
In a sense, the entire IPFS network can be thought of as a single, distributed file storage system. In fact, this is the ideal goal we are always striving for. We should be able to imagine a single user of IPFS having access to the entirety of what is stored on the network as if it were local.
In practice, keeping a local copy of some data makes sense both for fast access and because the user may want take responsibility for providing data to the network. In the IPFS software, the local copy is the blockstore.
We might think of the blockstore as similar to the web browser cache. Just like a user doesn't specify if a web page is cached locally or must be fetched remotely when they type in a URL, at the command like or HTTP api level, from a user perspective, we should strive to never worry about where data is coming from.
At the same time, unlike a web browser cache, when we receive data or add it from the command line, we may also become a host for that data on the network. Writing to the local blockstore is therefore only one part of a write operation. The second part is to provide the written cids to the DHT.
So to illustrate the symmetry of local / remote on read vs write:
Importantly, all new interfaces that work with DAGs, for reading or writing, must insure they perform both parts of a read and write (unless specifically told not to)
Level Of Abstraction, Designing New Services
Currently the key service that performs the local/remote abstraction is the BlockService. The block service abstracts at the level of Block read/writes. When I add a DAG, I call AddBlocks / AddBlock on the block service for all blocks in the DAG, which writes each block individual then calls out to Bitswap to provide each block individually. When I fetch a DAG, I call GetBlock / GetBlocks on the block service for all blocks in the DAG, which attempts to load the block locally then calls out to Bitswap to fetch the blocks from the network (which ultimately saves the blocks to the local store and provides them as well).
So we have: Blockstore -> Local Block Read/Write Bitswap -> Remote Block Read/Write BlockService -> Local/Remote Block Read/Write Abstraction
The system is coherent and works as long as local and remote fetching or providing works at the block level. However, consider the things this locks us into if we care about DAGs:
We've actually tried to move the providing to a different place and offer DAG roots only (see github.com/ipfs/go-ipfs-provider) but this is still experimental. And the tight integration of blockstore / bitswap / blockservice has made it hard to deliver on projects like this.
As we design our interfaces, I don't think we should try to change any of this immediately, but we should design them with an eye towards moving abstractions up in the future.
Hard questions in interface design
Currently we have go-fetcher and go-dagservice two in progress repos for working with IPLD prime data. Reads however often correspond with writes and i'm not 100% sure about keeping these seperate.
Here are some questions I am thinking about as we look at designing these two services: