wnfs-wg / car-mirror-spec

"RSync for DAGs" — efficient content-addressed graph transmissions without duplicates
Other
6 stars 0 forks source link

CAR file headers (may) need to be specified #6

Open matheus23 opened 1 year ago

matheus23 commented 1 year ago

The block sending side in CAR mirror sends CAR files. However, some implementations require a header to have at least one root CID (iroh-car being one of them). It's somewhat unclear whether CAR files need to contain all their roots.

From the spec:

I see two possibilities:

Should we specify this at all? We could also just... not specify this, and implementations will need to decide themselves what to put in there, but be lenient in what they allow. (unfortunately iroh-car currently errors out if the header roots are empty)

expede commented 1 year ago

Should we specify this at all? We could also just... not specify this

I'm in favour of leaving it to the implementations, but noting the situation in the spec. It's at a different layer from CAR Mirror. Any app that tries to use an arbitrary CAR library will run into this problem.

Put the first block's CID that's transferred in the CAR file into the header. That's currently the case in the WIP rs-car-mirror implementation.

I personally like this version. As you note, it's a bit of a hack.

This Issue is about the space of all possible CAR libraries, but do we know if there's a strong rationale to making this a hard requirement in iroh-car? Could we PR a change to remove this requirement?

some implementations require a header to have at least one root CID (iroh-car being one of them

Yeah, having some entrypoint is a good best practice IMO. That said, it's certainly not posisble to enforce that this be the rootmost entry in the CAR. If we go with the option that reifies a common root between the blocks, you don't gain anything more than the CID of the CAR itself.

It's somewhat unclear whether CAR files need to contain all their roots.

My guess about why this happened is that requiring all of the roots in the CAR file is exremely brittle. It is desirable so that you can jump around the structure easily, but you'll rediscover this structure as you process the CAR.

Obviously you won't be able to add all of the roots in a lot of streaming CAR cases, too.