Decide how to handle -1 in Prefix

Stebalien commented 7 years ago

Currently, Prefix.MhLength is an int and -1 can be (and is) used to mean "default length". Unfortunately, this means:

cid1.Bytes() == cid2.Bytes() does not imply cid1.Prefix() == cid2.Prefix().
Prefix.Bytes() is broken.

Solutions:

Make it a uint64, provide some convenience constructors constructors (e.g. func V1Prefix(codec uint64, mhType uint64) Prefix). This will break things.
Fix Prefix.Bytes() and provide an Equals method (less convenient in the long run).

Thoughts?

kevina commented 6 years ago

@Stebalien

I think the real problem is we are using Prefix for two different things which I think should be separated.

(1) To get information about an existing CD (2) To create a new CID

I think we should separate these two concerns. This is espacally relevant now as I am trying to think how to add support for automatically creating Identity hashes. I am tempted to hack on support with the existing Prefix structure but will only make things worse.

Instead I propose we use two different structures one to get information about a Cid and another to be used to actually generate a Cid, the first will have a Bytes() method but no Sum() method, and a unsigned MhLength; the latter will not have a Bytes() method, but will have a Sum() method, and a signed MhLength. The latter will also be extended to support creating identity hashes.

The only thing is what to call to two structures, should we use CidInfo and Prefix (where the first is for getting info on a Cid and the later for creating them); or maybe Prefix an CidOpts; or maybe rename both to CidInfo and CidOpts; or maybe something else as I am horrible and thinking of good names for things.

Thoughts?

Stebalien commented 6 years ago

I agree, Prefix is trying to do too many things. However, is Prefix really useful except as a way to create new CIDs? As far as I know, that's the only way we really use it. We may be able to just deprecate Prefix and then create a new Format type (or something like that).

kevina commented 6 years ago

Unless I am missing something obvious (always possible) getting the prefix of a Cid() is the only way to get the Version of the cid, although we could add a Version() method fairly easily.

Other than that I am not sure and would need to check everywhere Prefix is used.

schomatis commented 6 years ago

For me, the most confusing part is that I don't find the Prefix in the CID specification so I don't know what to expect of this structure. From the code it seems that Prefix has pretty much the same information as Cid minus the digest. It seems to be used for this double function as @kevina mentioned,

(1) To get information about an existing CD (2) To create a new CID

to the point where the ProtoNode structure has both a Prefix and a Cid (where some information is taken from the first structure and other from the second).

So I think the priority is to first decide at the specification level what role does the prefix play so the user has a clear mental model of it, and later discuss how the code should implement it (if at all) to avoid potential sources of confusion like

cid1.Bytes() == cid2.Bytes() does not imply cid1.Prefix() == cid2.Prefix().

The -1 in the Mhlength is just an example of this, where the Prefix duplicates the MhLength of the Cid's Multihash but declares it as a signed integer and employs a -1 signal that is not part of the multihash specification (which I personally had much trouble spotting the difference while working in https://github.com/ipfs/go-ipfs/issues/5019).

With this I don't want to undermine the Go implementation but (from what I'm understanding from the IPFS model) the specification should always precede the implementation (to avoid having go-cid being the reference implementation and being just one implementation).

schomatis commented 6 years ago

FWIW, the JS implementation of the prefix is worth looking at, where it is just a derivation (as a function, not a separate structure) of the CID.

kevina commented 6 years ago

So I think the priority is to first decide at the specification level what role does the prefix play so the user has a clear mental model of it, and later discuss how the code should implement it (if at all) to avoid potential sources of confusion like

I strongly disagree with this. The specification is about the format, not the API. Prefix is an implementation detail.

kevina commented 6 years ago

If part of the confusion is that the Prefix structure has a Bytes() method I think we should just remove it as I am not sure how much utility it has.

kevina commented 6 years ago

And maybe we should just eliminate the use of the word Prefix for this.

I like using CidInfo for a struct that is just about the information in a Cid (and doesn't have a Bytes() method) and Format for a struct used to create new Cid's. This way the user doesn't try to match up the word Prefix which how it is used in the spec.

schomatis commented 6 years ago

The specification is about the format, not the API. Prefix is an implementation detail.

That's fair, we may have different expectations of what a specification does. In my experience specifications (e.g., RFCs, what IPFS specs aspire to) tell me more, not just the format but the functions that interact with it, the main components involved, how they interact, the terminology they use, etc. They don't draw the blueprints down to the last level because that's were implementation specific details come up but it's certainly not a black box in terms of what to expect from it.

Anyway, thanks for the feedback, this is an interesting discussion (that exceeds this issue) about what to expect from IPFS specifications (I'll try to submit an issue at the specs repo about it).

ipfs / go-cid

Decide how to handle -1 in Prefix #23