Open achingbrain opened 3 years ago
The intention of prepare()
was to ensure properly formed but minimally supported data model forms that match the DAG-PB logical format schema: https://github.com/ipld/specs/blob/master/block-layer/codecs/dag-pb.md#logical-format
That schema also matches the new Go codec, https://github.com/ipld/go-codec-dagpb, so they're doing the same thing in accepting a missing Tsize
both from decode and from encode.
But we could change that. I don't have any strong opinions other than consistency. We get to choose the lens that we want to view the PB as, and we landed on that schema in the spec after iterating around the various forms of it.
Two options for dealing with it can be found in what we did with Hash
and Links
:
Hash
is mandatory in both encode and decode. Even though it's possible in the PB to omit it, we reject links that don't have a Hash
because that's just not a link and as far as I'm aware nothing in the wild should be producing links with no Hash
field (maybe that's incorrect and we'll need to backtrack on that decision!).Links
must always appear in the data model, even though it's optional in PB - there's no distinction between empty array and no array in the PB so we get to choose what it should be so we've said that it must appear in the PB and we make it appear out of thin air when it's not present in the bytes, because it's just nicer to be able to have the assurance that Links
is always a thing in the data model rather than having to check whether it's null
or not.So we could choose to do something similar with Tsize
. Is there a chance that there's lots of blocks in the wild with a missing Tsize
? Because the problem is that then there'll be round-trip consistency issues when you take a block without a Tsize
, assign it one and re-encode.
A half-way hack is to make prepare()
just insert one but still have encode()
(edit:) NOT reject if there isn't one, that's kind of what prepare()
is there for. But we lose some symmetry with Go unless we encode these things in the spec.
Maybe a more focused response is - what are the chances that if we remove optional
from this that we'll find blocks in the wild that have no Tsize
at all? Do we have any code in Go or JS currently that might be producing such blocks? You're much more familiar with DAG-PB in the wild than me.
If we're looking at a near-0% chance, then we could just remove optional
and then prepare()
here can default to 0
and the encode()
will be consistent with what the Go codec can do (once the schema is updated there).
If there's only a slightly above 0% chance, maybe we could forgo the round-trip consistency and say that we interpret a missing one as 0
and if you're dealing with a block then you just don't get the ability to go back to those "incorrect" bytes because we're going to stick a 0
in there for you. Round-trip consistency doesn't usually matter a whole lot in practice, especially if you're dealing with edge-case data.
maybe we should export a util
that can make working with Tsize
easier, like what's brought up in https://github.com/ipfs/js-ipfs-repo-migrations/pull/98, prepare
and validate
could be moved into those as well (tho that'd be a breaking change)
const { totalSize } = require('@ipld/dag-pb/util')
// totalSize(links: PBLink[]): number
const size = totalSize(node.links)
I'm not sure what else would be useful but there's probably some other patterns that could be generalised
Is there a chance that there's lots of blocks in the wild with a missing Tsize?
AFAIK IPFS implementations have only ever created links with Tsize
fields but the schema allows it so I guess it could be encountered.
I think the TODO here is to enumerate, or just PR, utility functions that make this all much easier. Such as https://github.com/ipld/js-dag-pb/issues/5#ref-pullrequest-892377406
It looks like this module can generate links without a
Tsize
property. The protobuf schema allows for this (along with not having a Name or a Hash) but we're previously decided that implementations should generate a value forTsize
.Would you accept a PR making this field mandatory?
It means you won't be able to do things like:
..but IPFS will not do that for the reasons in the linked thread above.