Closed willscott closed 1 year ago
cc @lidel @hannahhoward @bajtos
This is a bad idea.
Thank you @aschmahmann for a quick response and the pointer to the previous discussion about what qualifies as a codec.
The particular thing this nominative type is being used for (again not defined here, but something I understand from reading some Notion docs and Slack messages) is fixing a deficiency in the CAR format by mixing the control plane and data plane which has received heavy pushback from stakeholders. Seems like this effectively is a change to the CAR format anyhow since it'll be expected across CAR implementations so why not actually change the format and go through the process of working with stakeholders of that format.
In our use case, we want to append an extra metadata block at the end of the CARv1 file to provide a kind of checksum: byte length of the CAR file, hash of the CAR bytes, and signature of the previous two fields. Do you consider such metadata as a control plane too?
Yeah, I'm not really a fan either, for the same reasons @aschmahmann outlined (and as per all the previous discussions on this stuff).
Would it not be enough to just put a dag-cbor block at the end with a strongly typed schema that you can check?
(as dag-json)
{
"car-metadata/v1": {
"properties": {
"byte-length": 18390,
"multihash": {
"bytes": {
"/": "..."
}
}
}
"signature": {
"bytes": {
"/": "..."
}
}
},
}
(schema)
type CarMetadata union {
Metadata "car-metadata/v1"
} representation keyed
type Metadata struct {
properties CarProperties
signature Signature
}
type Properties struct {
byteLength Int (rename "byte-length")
multihash Bytes
}
type Signature ...
This was one of the original design goals of IPLD schemas - they should be fast and efficient enough to do these kinds of checks.
There's also the "messaging" hack I prototyped for CARv2: https://github.com/ipld/go-car/pull/322 & https://github.com/ipld/js-car/pull/89 after a similar request for this kind of thing came up from the web3.storage team who wanted to put some metadata inside of their CARs. It was never used, and it fits in before the data payload, which may not be ideal - but since a CARv2 wraps a CARv1, given a 2-step process it might make even more sense than this since you're providing metadata about a thing within the thing itself, which is not so ideal since the metadata needs to be excluded from the thing somehow—and we'd have to assume that there'd be some layer that would strip this cleanly so it doesn't end up making a mess of other layers of the stack where such a codec code would make a mess of things.
I'm sure @lidel can pull up some record of the discussions about putting a tombstone at the end of trustless gateway CAR responses, those are probably relevant here and maybe back on the table?
If this becomes part of the protocol, then having a new codec code in CIDs is going to cause a bit of havoc for users—doing a simple curl
of the CAR will leave you with a CAR that has a block you can't decode without having this codec in your software's registry. A similar problem applies for a dag-cbor (or other common codec) tombstone block, but it's a bit more manageable - you can at least get your content out without your parser/checker borking at the block, you could easily import it into a Kubo node, parse it in your browser, without having to have this codec registered; you just have a weird block to deal with, which could be a problem in some scenarios.
But having said all that, if we do go with a tombstone, then a couple of extra thoughts:
curl
problem; lassie could strip it out and give you the relevant data in some side channel).curl
problem.I think the next step is to propose a spec+code change in go-car
on the specific semantics for reading/writing a car stream against a network with support for metadata blocks.
I would appreciate pointers on other spec impact or parts of this project that should also be considered as we proceed
Back to the basic request here of adding a new serialization/codec to the table - we can't do that where you're asking for a duplicate of an existing serialization method. We've got ample history of rejecting such additions where someone just wants a code to represent a codec+schema, which is what this is. So as long as this is, at base, cbor, json, or some other existing unschema'd codec, then it can't go in. Schema signalling via codec code is a door that we've tried hard to keep closed, pushing people to keep that level of signalling out of the CID. The demand for such signalling tells us that there is a hole in the stack that people want utilities for, but that's where discussions about CIDv2 (and other methods) come in; jamming it into CIDv1 is going to make things much harder to manage.
don't we do that with cbor vs dag-cbor and json vs dag-json today?
would it be more acceptable to call this a 'transport' signalling value, like the other transport signalling values already present in the table?
In https://github.com/ipfs/specs/pull/431, we pivoted to a different solution that does not require any new multicodec.
I think this PR can be closed now.
When transporting car data, it can be useful to insert a block of data with key/value metadata signalling inline with the car data blocks.
This PR allocates a codec
car-metadata
to signal such data as blocks within the stream of car blocks.