oconnor663 / bao

an implementation of BLAKE3 verified streaming
Other
479 stars 23 forks source link

feat: async support #44

Open dignifiedquire opened 1 year ago

dignifiedquire commented 1 year ago

Took a first stab at async support, just encoding for now.

A big issue I am running into right now, is that I want to use this the Encoder write to a network stream without buffering much, but unfortunately that requires Read + Seek, both of which are not available on an outgoing stream connection.

Any thoughts on how/how hard it would be to build an encoder that does not require reading back any content from the underlying target?

Ref #43

oconnor663 commented 1 year ago

Apologies, I probably won't have time to read through all the code today.

Any thoughts on how/how hard it would be to build an encoder that does not require reading back any content from the underlying target?

The fundamental challenge here is that all the nodes in the Bao stream are laid out in pre-order. So the very first thing that needs to hit the wire is effectively the root hash of the entire input, and that means you need to process the entire input before even a single byte of encoded output is ready. (Well, the first 8 bytes are just the length, so it's actually the ninth byte that requires the entire input.)

The intended way to work around this is bao slice and bao decode-slice. The assumption there is that you run the entire bao encode locally and save the result on disk. If you save the "combined" encoded file, you can just serve that verbatim. If you save the "outboard" encoded file, to avoid the space overhead of an entire copy of the original input, then you can use bao slice together with the original and the outboard encoding to reproduce the "combined" encoding. That does work in streaming-friendly way, and it doesn't require seek support in the output.

Currently Bao always saves the entire hash tree, which is ~6% the size of the original input. (So a "combined" encoding is 106% of the input size, and an "outboard" encoding is 6%.) But a big TODO for me is to support configurable "chunk group" sizes, where Bao would omit the lower levels of the tree and recompute those on the fly as needed. Then you could tune the size of the outboard tree, saving space on disk in exchange for needing to buffer more input during decoding and seeking.

dvc94ch commented 1 year ago

so the preorder/postorder thing is an artifact of doing a two pass encoding of the bao format using two buffers.

for my purposes I wrote a hasher that produces a Vec<Insertion> and writes the contents atomically into a kv store so that pre/post order traversal can be performed efficiently. The main reason for not using the bao crate was that decoding all slices of the file should be able to reproduce the "hash tree" without having to rehash the data. this allows peer to provide the slices they have without having to first sync the entire file and rehash it's contents.

See https://github.com/dvc94ch/blake-tree/blob/master/core/src/hasher.rs and https://github.com/dvc94ch/blake-tree/blob/master/core/src/tree.rs for the details

dvc94ch commented 1 year ago

But with regards two this PR I have two questions.

  1. does making it async provide a measurable difference in any benchmark
  2. is this work intended to be used in iroh? that would be an exciting break from existing ipfs orthodoxy
dignifiedquire commented 1 year ago
  • does making it async provide a measurable difference in any benchmark

it's less about specific perf improvements, more about easier integration into existing pipelines/code

  1. is this work intended to be used in iroh?

Eventually, yes that is the goal, or at least sth like this