RFC|BB|L2-09/10 Handling arbitrarily large blocks and Treating files as large blocks

aschmahmann commented 3 years ago

These are two proposals, one is mostly about Bitswap and the other about IPFS. They build off of the concepts from #25.

Areas I think are in need of the biggest confirmation/feedback:

Is the SHA1/2 construction safe? Is it safe for SHA-1 in addition to SHA-2?
The IPFS proposal seems like something people frequently ask for (e.g. this thread), but are there reasons not to do this?

A lot of this was taken from https://hackmd.io/@adin/sha256-dag which has markdown that allows graphviz drawings 😄

adlrocha commented 3 years ago

I love these proposals @aschmahmann. One quick note, you've probably seen it already, but we had this RFC in a REALLY early brainstorm stage (we ended up not having the time to discuss it and develop it further) that goes in the line of these two RFCs. It was more focused on allowing the exchange of larger blocks through the wire (without having to change any of the building blocks, i.e. blockstore, CIDs, max block size, etc.).

Our proposal was to introduce the concept of piece as an "irreducible aggregation of blocks" uniquely identified through a multihash. Thus, clients would be able to request in the exchange manifest the size of the pieces it wants to receive. This would promote the reception of content from peers that store a minimum number of blocks of the requested content.

This requires some additional computation in the seeder's side to build these "pieces", but they can be easily cached (similar to what is done in FIB tables of in-network caches):

piece_multihash ----> [CID1, CID2, CID3, ...]

Your proposals are way more thoroughly thought, and I think they deprecate this RFC, but wanted to reference it here in case it inspires someone, and for the sake of completion.

Edit:

Why do we have a (1MiB) maximum block size? Without a maximum block size there’s nothing stopping an attacker from sending us lots of bogus data.

The fact that is the client the one pulling the desired block size prevents the "bogus data" misbehaviour. However, this scheme can still introduce new potential attacks to worry about such as clients requesting extremely large piece sizes, making seeders to do a lot of useless work in building the piece.

aschmahmann commented 3 years ago

@adlrocha yep I took a brief look at the RFCs in the repo, although I may not have fully grasped all of them 😅. My understanding of rfcBBL207 is that it's pretty different from this in that it is basically asking for an IPLD selector version of Bitswap's HAVEs where I say "please tell me if you have this CID and some part of the graph underneath it" as a way of not wasting connections.

If this is a concern it might be useful to utilize #25 to get something like a graph manifest from many peers and then use that to inform scheduling of which peers to ask for which blocks without needing many HAVE queries.

as clients requesting extremely large piece sizes, making seeders to do a lot of useless work in building the piece.

The above approach might alleviate your concern here assuming that running the IPLD selector query isn't too expensive or could be limited.

PS: If GitHub ever introduces threads outside of commenting on a random line in a PR I'll be a happier person 😄

zacharywhitley commented 3 years ago

I think that there might be a use for LTHash in here somewhere. It's a homomorphic hash that has the property that it's composable so that the final hash can be computed over the chunk hashes regardless of how it's chucked. I think this would address the issue of CID aliases where a file has multiple alias because different chunkers were used. With an LTHash the file hash can be computed consistently over multiple chunkings. I think you can also publish a list of chunk hashes and verify that it would produce the final hash as long as the intermediate hashes hashed correctly. Still doesn't solve the problem if someone wanted to lookup by some other hash like sha256 but it might provide a way of referring to the complete file hash that might be more resistant to Sybil attacks.

LTHash is the first practical composable hash that I know of as SL2Hash had issues.

https://github.com/lukechampine/lthash https://engineering.fb.com/2019/03/01/security/homomorphic-hashing/

Ericson2314 commented 3 years ago

I'm a bit confused why this proposal is trying to roll in the ideas from #25 all at once. I think it would be simpler to just start with the naive and trustless reverse stream algo, and then combine it with #25. Basically, the core idea is all hashing induces a "freestart merkel dag", and we are taking advantage of that for the the first time.

I'm not against the #25-like bits, I just want to layer ideas as well as possible and reduce risk.

Ericson2314 commented 3 years ago

https://github.com/protocol/beyond-bitswap/pull/30 to flesh out the above a made a PR-against-the-PR for what I thought the first step before #25-like stuff would look like.

aschmahmann commented 3 years ago

I'm a bit confused why this proposal is trying to roll in the ideas from #25 all at once.

Basically because without it L2-10 is IMO pretty useless since instead of being able to download a file in parallel across many peers you now (assuming SHA1/2) can stream data from at most one peer and do so at a rate of at most 1MiB per RTT. Here is how I see it:

You can do L2-09 without #25, but it's utility (assuming you're using SHA1/2 and not a tree based hash) IIUC is basically restricted to dealing with existing formats like Git that happen to sometimes have >1MiB blocks and so downloading really slowly is better than nothing. Having something like #25 is what makes this useful enough to even bother proposing L2-10.

Ericson2314 commented 3 years ago

IIUC is basically restricted to dealing with existing formats like Git that happen to sometimes have >1MiB blocks and so downloading really slowly is better than nothing.

I'll admit just getting something to work with Git is my main priority.

And what about the layering argument, where you have:

L2-09: chunking for bitswap, what I wrote:
L2-10: L2-09 + #25 chunking for graphsync, what you wrote
L2-11: What was L2-10

The IPFS ecosystem as a whole i phenomenal at layering. I'm not sure whether the graphsync vs bitswap distinction is supposed to be permanent or is more to track the evolution of a protocol, but if it's the former, the 3 RFC split keeps that distinction.

Finally, per "Make it work, make it right, make it fast", it seems like it's good to be able to sign off on the relatively easy parts while the fancier graph sync stuff is worked out? E.g. there are a gazillion variations on the latter two steps (in the 3-part plan) that people might want, as evidenced by the big discussion this PR has seen already. But the first step is such a constrained design space I think there's not too much to disagree about.

hannahhoward commented 3 years ago

I want to put forward what I see as the simplest version of this proposal, which is just a mechanism to support block sizes > 1MB in bitswap. This avoids the question of DAG equivalence, new multicodecs, implicit IPLD Dag relationships, etc.

Currently, bitswap & graphsync have the following format for blocks at the protocol layer

type Block struct {
   Prefix []byte
   Data []byte
}

I believe if we now modify this to be something like:

type Block struct {
   Prefix []byte
   Data []byte
   IsChunked bool
   ShaSoFar []byte
   RemainderSha []byte
}

And require the chunks are sent in order have everything we need to support arbitrarily large blocks in bitswap and graphsync

we'd maintain a map in bitswap / go-graphsync (actually probably a ministore on disk in reality):

var inProgressBlocks map[cid.Cid]struct {
  bytesSoFar []byte
  shaSoFar []byte
}

If I get back a chunked block, if the ShaSoFar is nil, I hash Data + RemainderSha based on the Prefix to get the original CID and make sure there is no existing entry in the map, AND the CID matches a want. I make a new entry.

IfI get a back a chunked block with ShaSoFar != nil, I hash ShaSoFar + Data + RemainderSha based on the Prefix and get the original CID, and make sure there is an existing entry in the table AND the ShaSoFar matches the one in the Block. I add the data and update the ShaSoFar to include it. If RemainderSha = nil I verify the updated ShaSoFar matches the whole CID, and remove the entry from the blockstore, and save the assembled block.

Potential Advantages/Disadvantages:

Simpler in that we avoid defining new multicodecs, having to introduce any notion of an IPLD feedback loop into Bitswap, avoids Graphsync queries solely for this purpose, DAG equivalences, etc
It has the disadvantage that you don’t break up the sub block requests among multiple peers — if I ask for a CID to a 100MB block, I’m stuck getting it from that peer.

I think all the other larger questions are important ones to address, but if the immediate need is serving GIT objects, this strikes me as the fastest and simplest possible path.

Edit: Upon further reflection I believe I have simply reinvented @Ericson2314's https://github.com/protocol/beyond-bitswap/pull/30

(though perhaps fleshing out what it looks like at the protocol level is helpful?)

hannahhoward commented 3 years ago

I would like to see us define "DAG equivalence" at an IPLD level more clearly before we move towards trying to support looking up files in IPFS by the sha of all their bytes. I believe what we are essentially talking for files at least, is equavalence in the context of IPLD's Flexible Byte Layout ADL: https://github.com/ipld/specs/blob/master/data-structures/flexible-byte-layout.md and now might be a good time to start to @warpfork et al about how different layouts of the same bytes might be able to be looked up by the SHA of all the bytes.

protocol / beyond-bitswap

RFC|BB|L2-09/10 Handling arbitrarily large blocks and Treating files as large blocks #29