ipld / go-car

A content addressible archive utility
Other
153 stars 44 forks source link

Feedback/proposal: separate concerns more and provide a safe wrapper around the car format #416

Open Jorropo opened 1 year ago

Jorropo commented 1 year ago

Note this mostly exclude indexes from the picture because I havn't used them and havn't needed them so I can't comment well on their API.

The APIs are either too low level and require consumers to have a copy of the car spec to be used or provide a level above the CAR format and requires consumer to provide a bunch of features they might not have.

APIs that are too level to be used without a copy of the car spec:

The things above are usefull, but I don't think they are enough to claim this librairy can be used to easily decode car files. It's like trying to use encoding/json but you can only use json.Decoder.Token.

APIs that are too high level and provides features and types that are not needed to interact with the car format:

Thoses are usefull, but they are specialised helper functions, if I am not creating a car file from a random access CID block interface (github.com/ipfs/go-ipld-format.NodeGetter) or if I am not using https://pkg.go.dev/github.com/ipfs/go-merkledag or https://pkg.go.dev/github.com/ipld/go-ipld-prime I cannot use thoses.

Things I think are good:


Streaming a carv1 body from a stream of blocks.

This can't be found in neither the v1 or v2 packages. You have to write this code:

util.LdWrite(writer, block.Cid().Bytes(), block.RawData())

Which is impossible to figure out for any new comer without a deep read and exploration of the car spec or by looking up some code that already do this.

Note that it's also really easy to messup, the ...[]byte might lead you to think you can do this for example: util.LdWrite(writer, block.Cid().Bytes(), block.RawData(), block2.Cid().Bytes(), block2.RawData()) but no this does not follow the car spec and will be silently incorrectly serialised.

I get why this API exists, I can see edge cases where it would be usefull, I don't think it is acceptable as the only way to stream a stream of blocks.


Things I think would make this better:

Provide an util.Ldwrite free way to stream a car body. An API like this would be enough:

// BlockWriter streams blocks to an io.Writer.
type BlockWriter struct{/* ... */}

func NewBlockWriter(w io.Writer, roots []cid.Cid, opts ...WriteOption) (*BlockWriter, error) {/* ... */}

func (bw *BlockWriter) Write(b blocks.Block) error {/* ... */}

// WriteFromReader allows for zero copy through [io.ReaderFrom] or [io.WriterTo].
func (bw *BlockWriter) WriteFromReader(c cid.Cid, r io.Reader) error {/* ... */}

I would also move the helpers and lower level functions away in different packages. Given the current state creating a new package like github.com/ipld/go-car/simple bundling easy safe wrappers around the car format sounds simpler (no need to have a tool rewrite consumers to a new import path).


Somewhat out of scope notes:

It is impossible to do anything allocation free, random example about reading blocks: It would be nice if Blockreaders object had a Peek() (cid []byte, block []byte, error) method, the difference is that it use bufio.Reader.Peek and returns a pointer to bufio.Reader's internal pointer, this allows to read a block without allocation.


Just so you know I'll make thoses changes to github.com/ipfs/boxo/car and provide a lighter API (just expose BlockReader and BlockWriter).

willscott commented 1 year ago

In trying to understand the proposal here I'm struggling a bit to understand the consumer that we're building this for. I think the interface you propose above is the direct one used at the moment in Kubo, but that has ended up being pretty unique.

In almost all of our other imports/exports of car files, there is a trust boundary and so we end up with a traversal / block-level validation.

things like TraverseV1 provide this, and are safer than allowing writing of blocks in the way you propose. @rvagg maybe can chime in, but I think having writing exposed through IPLD dags rather than directly was a design choice to make it less easy to end up with a proliferation of un-ordered car files in the wild.

Your argument that "they are specialised", is something I'm interpreting as pointing to this use of ipld-prime or of ipld format. I think this is the first time I've heard it proposed that one would be constructing multi-block car files without using IPLD. * Both prime and format libraries are in kubo, and at least one is used everywhere else we engage with cars that I'm aware of.

willscott commented 1 year ago

you end with

Just so you know I'll make thoses changes to github.com/ipfs/boxo/car and provide a lighter API

does this mean this decision of work has already been made and prioritized?

Jorropo commented 1 year ago

does this mean this decision of work has already been made and prioritized?

@willscott no, this was unclear, I'll open a pull request we are gonna review it at some point in the future.

Jorropo commented 1 year ago

The core of the argument is that the car format is a stream of blocks, I do not belive it is defined anywhere in the spec that a car file must represent a traversal. I don't see any selector field in the header either. It actually contains:

The requirement for the blocks in a CAR to form coherent DAGs is not strict, so the CAR format may also be used to store arbitrary IPLD blocks.

And I am surprised that go-car does not have safe support for writing arbitrary IPLD blocks streams given it's what the spec actually defines. I get that your are using

Your argument that "they are specialised", is something I'm interpreting as pointing to this use of ipld-prime or of ipld format.

There are two sides of it:

  1. You could use go-car without using ipld-prime or ipld-format.
  2. Even if you use ipld-prime or ipld-format, you might want to drive your own traversal loop instead of reading blocks from a blockstore.
    1. The defaults are not fast.

1.

That just seems like a wrong premise. I don't know what to answer except I did it, it was great: https://github.com/Jorropo/linux2ipfs. This really isn't harder than using ipld-prime or ipld-format assuming you only want to support one single content addressed format. Neither ipld-prime or ipld-format are optimised for performance or data oriented design, if someone cares about performance this is a really good route to take.

2.

There are realistical usecases where the consumer is already doing a traversal, having go-car traversing the same data again is then at best very wastefull at worst impossible.

For example linux2ipfs: It parallelize and pipeline recursively traversing the directory, computing the deltas, adding and updating new files and directories and uploading data. If I wanted to do that safely using go-car I would have to store all of my blocks in a blockstore first, doubling the writes and redeading a bunch of data I already have. I could also not send car in parallel of adding my data because I can't start the traversal without knowing the root. The current implementation is able to write car files on the fly.

(note: it then do double buffering behind that but ideally I would remove this, this is needed for compat with estuary, I wanted to remove it but never got around updating estuary's API, there is a still a speedup to stream stuff to the buffer)

The add loop is a pipelined paralised mess (which isn't parallised enough yet). That is in no way realistical to be supported in go-car.

An other example I can come up with is a tool that download data over graphsync and then outputs a car file. The graphsync client already do a traversal and takes care to verify it, why would I run the same traversal again then ?

2.1

TraverseV1 execute WalkMatching which will fetch blocks one round trip at a time. This prevents using hdds to their fullset potentials, excluding any realistical attempt at fetching this over bitswap.


I'm not asking for go-car to support all kind of traversal options, I'm just saying that we could just have a simple API, and trust consumers to write the kind of loops, recursive, paralelised loops, ... traversal pattern they want.

Note that this feature is already available, it's just using the dangerous util.LdWrite. I think almost all occurences would be greatly improved by having a BlockWriter: https://grep.app/search?q=.LdWrite%28&case=true&filter[lang][0]=Go

willscott commented 1 year ago

1

2

I do also want to point out that the blockstore interface over a car is the other viable alternative that might meet you needs: e.g the example in: https://github.com/ipld/go-car/blob/master/cmd/car/filter.go#L64-L101

2.1

The ipld-prime traversal no longer has this limitation due to the preload structure introduced somewhat recently to allow for efficient bitswap.

Jorropo commented 1 year ago

1.

I'd probably advocate to add it in the v2 module as that's where we've made interface additions over the last year.

SGTM :slightly_smiling_face:

2.

  • in the same thing you found with estuary, there are orders for car files that are incrementally verifiable (root first). go-car currently pushes pretty hard for this ordering, which is the one we'd like to see used for filecoin deals as much as possible.
  • Having active tools promoted that lead to proliferation of non-canonical cars leads to complexity for other teams in the ecosystem - .storage and saturn to name a couple.
    • That inter-relation is why we've had the car tooling under IPLD and have IPLD stewards/team historically directing that design.
    • cc @BigLep on this point - you've been closer to the ipld maintenance recently than I have, so may have a better sense of how we see this side of it today.

I understand the importance of incrementally verifiable car files and the push for their use in Filecoin deals. However, it's worth noting that as you most likely already know incremental verification and pipelined incremental construction are inherently incompatible due to the nature of hashes. Incremental verification requires to start at the root and go down, but incremental construction finishes at the root since you need to know the N-1 hashes to build the root (and this applies through the whole tree).

In my specific use case with linux2ipfs and my own cluster of IPFS nodes, the API is trusted between me and myself and don't require incremental verification. My ideal solution would be to stream unordered blocks over a an HTTP stream between linux2ipfs and my trusted cluster.

Regarding the complexity for other teams in the ecosystem, have you considered specifications :sparkles: ? I believe that clearer specifications could help address these concerns. The IPLD specs differ from what has been discussed here:

The requirement for the blocks in a CAR to form coherent DAGs is not strict, so the CAR format may also be used to store arbitrary IPLD blocks.

This text precisely explain that any order or completness or whatever about the content of the car is not an RFC2119 MUST.

For instance, there was an issue last year with estuary failing Filecoin deals due to differing directory name sorting methods. Turns out all unixfs implementations up until this point only ever produced lexicographicaly sorted directories, while linux2ipfs was doing a mix of lexicographical and numerical sorting. The problem has been resolved, but it highlights the issues with relying on undefined behavior in the implementation.

I think using specifications to enforce technical requirements is much better than artificially limiting APIs based on what a selected group of wizards :mage: holds as source of truth in their mind.

A possible solution could be a new spec that utilizes a new bit in the V2 bitfield to indicate whether a car is ordered, along with a field for the ipld selector to allow for incrementally verifiable partial cars (like produced by lassie) without having to transfer this OOB. This way, unordered cars could be streamed to Kubo, while other tools could easily filter and error if the ordered bit is missing. This would allow early error detection if a linux2ipfs or other unodered car is uploaded inappropriately.

2.1

Thx for pointing out the recent update to ipld-prime traversal with the preload structure. I wasn't aware of this improvement.

Out of topic notes:

Estuary does not incrementally verify car files, it just require to knows the root and do a complete traversal after the fact (I use a dirty trick to make it accept partial DAGs traversal by building a fake DAG that only contains the blocks of the partial DAG in the precise car being uploaded using some dirty codec hacking). And the header comes first in car so I do double buffering to pipeline adding while still updating the root. I write to the backbuffer while the front buffer is being sent to Estuary swapping them once the back buffer goes above 31GiBs.

I've considered to use an MD4 or MD5 seed protobuf block, then hot patch it with the final fake root in that car using a length extension attack. This would allows to send the car header with an MD4 multihash before I even know what the final hash for this 31GiB car is and then "update" data pointed to by the MD4 digest in the final block. I didn't because I didn't wanted to be that guy once this gets fixed (it might even never had worked, I didn't tried it).

BigLep commented 1 year ago

2023-05-01 conversation between @Jorropo and @BigLep :

Theme 1 will be addressed by adding a new CAR spec for ordered CARs.

Theme 2 is handled by some APIs that @rvagg shared that already exist in ipld/go-car.

rvagg commented 1 year ago

I drafted a long response to this (longer than this one!) but I'm going to ditch that and say a couple of things, focusing on the CAR-creation process and will ignore the rest of the original post because that doesn't seem actionable.


As I've said in at least one other discussion - my preference would be to further evolve go-car (and much of the rest of our stack) toward the go-ipld-prime perspective of IPLD. As it stands, much of the functional undergirding of our stack is already there but we have layering of abstractions in order to be backward compatible with ancient APIs that we don't want to go away. Hence the v2/storage package which eschews (almost?) all legacy components and is designed to plug in to go-ipld-prime native code. And this is how I prefer to build CARs, and it's how we're building them with Lassie; in multiple places. We use v2/storage in conjunction with go-ipld-prime's traversal engine to build strictly deterministic and verifiable CARs without having to do much work ourselves.

A distilled form in code that I'm currently working on for a test case that has it all in one place is here (this commit may or may not persist, but if not, a later reader will hopefully find it in pkg/retriever/httpretriever_test.go in a function called traverseCar): https://github.com/filecoin-project/lassie/blob/bf7ac2e17df6891fe771c5bf55dd16c739e00fa2/pkg/retriever/httpretriever_test.go#L504-L540

Similar code can be found in go-car in TraverseV1, and also in SelectiveCar in the root (v0) package, which is where TraverseV1 came from, but in the Lassie case, we're generating specifically a V1, using a strictly (deterministic) ordered traversal, no duplicate blocks, to an io.Writer.


WRT BlockWriter in the original post, that seems like a very reasonable design. My personal preference would be to not see blocks.Block, but I recognise that a lot of code exists, and is continuing to be written, with dependencies on this (legacy) abstraction. So it's probably reasonable, and it is true that it's not exactly easy to use the existing APIs to do a simple job like turn a list of blocks.Block into a streaming CAR.

Although again, you could do it with v2/storage, and just control the traversal/ordering yourself, just using store.Put(), something like this:

func writeCar(root cid.Cid, blocks []block.Block, w io.Writer) error {
  carWriter, err := storage.NewWritable(w, []cid.Cid{root}, car.WriteAsCarV1(true), car.AllowDuplicatePuts(false))
  if err != nil {
    return err
  }
  for _, blk := range blocks {
    if err := carWriter.Put(blk.Cid().KeyString(), blk.RawBytes()); err != nil {
      return err
    }
  }
  return nil
}

Lastly, I'll note that WriteFromReader is probably not going to create the happy-times you want; for the same reason I didn't implement PutStream() which is an optional go-ipld-prime storage primitive. This is because a CAR section needs the CID length + byte length to build its prefix, so you're going to collect the bytes before you write the section anyway. You could implement it, but it'd be sugar rather than "zero copy".