ipld / js-dag-pb

An implementation of the DAG-PB spec for JavaScript (for use with multiformats or @ipld/block)
Other
12 stars 5 forks source link

Links without Tsize #5

Open achingbrain opened 3 years ago

achingbrain commented 3 years ago

It looks like this module can generate links without a Tsize property. The protobuf schema allows for this (along with not having a Name or a Hash) but we're previously decided that implementations should generate a value for Tsize.

Would you accept a PR making this field mandatory?

It means you won't be able to do things like:

const node = prepare({
  Links: [
    new CID('Qmfoo')
  ]
})

..but IPFS will not do that for the reasons in the linked thread above.

rvagg commented 3 years ago

The intention of prepare() was to ensure properly formed but minimally supported data model forms that match the DAG-PB logical format schema: https://github.com/ipld/specs/blob/master/block-layer/codecs/dag-pb.md#logical-format

That schema also matches the new Go codec, https://github.com/ipld/go-codec-dagpb, so they're doing the same thing in accepting a missing Tsize both from decode and from encode.

But we could change that. I don't have any strong opinions other than consistency. We get to choose the lens that we want to view the PB as, and we landed on that schema in the spec after iterating around the various forms of it.

Two options for dealing with it can be found in what we did with Hash and Links:

So we could choose to do something similar with Tsize. Is there a chance that there's lots of blocks in the wild with a missing Tsize? Because the problem is that then there'll be round-trip consistency issues when you take a block without a Tsize, assign it one and re-encode.

A half-way hack is to make prepare() just insert one but still have encode() (edit:) NOT reject if there isn't one, that's kind of what prepare() is there for. But we lose some symmetry with Go unless we encode these things in the spec.

rvagg commented 3 years ago

Maybe a more focused response is - what are the chances that if we remove optional from this that we'll find blocks in the wild that have no Tsize at all? Do we have any code in Go or JS currently that might be producing such blocks? You're much more familiar with DAG-PB in the wild than me.

If we're looking at a near-0% chance, then we could just remove optional and then prepare() here can default to 0 and the encode() will be consistent with what the Go codec can do (once the schema is updated there).

If there's only a slightly above 0% chance, maybe we could forgo the round-trip consistency and say that we interpret a missing one as 0 and if you're dealing with a block then you just don't get the ability to go back to those "incorrect" bytes because we're going to stick a 0 in there for you. Round-trip consistency doesn't usually matter a whole lot in practice, especially if you're dealing with edge-case data.

rvagg commented 3 years ago

maybe we should export a util that can make working with Tsize easier, like what's brought up in https://github.com/ipfs/js-ipfs-repo-migrations/pull/98, prepare and validate could be moved into those as well (tho that'd be a breaking change)

const { totalSize } = require('@ipld/dag-pb/util')

// totalSize(links: PBLink[]): number

const size = totalSize(node.links)

I'm not sure what else would be useful but there's probably some other patterns that could be generalised

achingbrain commented 3 years ago

Is there a chance that there's lots of blocks in the wild with a missing Tsize?

AFAIK IPFS implementations have only ever created links with Tsize fields but the schema allows it so I guess it could be encountered.

rvagg commented 3 years ago

I think the TODO here is to enumerate, or just PR, utility functions that make this all much easier. Such as https://github.com/ipld/js-dag-pb/issues/5#ref-pullrequest-892377406