Open Stebalien opened 7 years ago
@Stebalien
I think the real problem is we are using Prefix for two different things which I think should be separated.
(1) To get information about an existing CD (2) To create a new CID
I think we should separate these two concerns. This is espacally relevant now as I am trying to think how to add support for automatically creating Identity hashes. I am tempted to hack on support with the existing Prefix structure but will only make things worse.
Instead I propose we use two different structures one to get information about a Cid and another to be used to actually generate a Cid, the first will have a Bytes() method but no Sum() method, and a unsigned MhLength; the latter will not have a Bytes() method, but will have a Sum() method, and a signed MhLength. The latter will also be extended to support creating identity hashes.
The only thing is what to call to two structures, should we use CidInfo
and Prefix
(where the first is for getting info on a Cid and the later for creating them); or maybe Prefix
an CidOpts
; or maybe rename both to CidInfo
and CidOpts
; or maybe something else as I am horrible and thinking of good names for things.
Thoughts?
I agree, Prefix is trying to do too many things. However, is Prefix
really useful except as a way to create new CIDs? As far as I know, that's the only way we really use it. We may be able to just deprecate Prefix and then create a new Format
type (or something like that).
Unless I am missing something obvious (always possible) getting the prefix of a Cid() is the only way to get the Version of the cid, although we could add a Version() method fairly easily.
Other than that I am not sure and would need to check everywhere Prefix is used.
For me, the most confusing part is that I don't find the Prefix
in the CID specification so I don't know what to expect of this structure. From the code it seems that Prefix
has pretty much the same information as Cid
minus the digest. It seems to be used for this double function as @kevina mentioned,
(1) To get information about an existing CD (2) To create a new CID
to the point where the ProtoNode
structure has both a Prefix
and a Cid
(where some information is taken from the first structure and other from the second).
So I think the priority is to first decide at the specification level what role does the prefix play so the user has a clear mental model of it, and later discuss how the code should implement it (if at all) to avoid potential sources of confusion like
cid1.Bytes() == cid2.Bytes()
does not implycid1.Prefix() == cid2.Prefix()
.
The -1
in the Mhlength
is just an example of this, where the Prefix
duplicates the MhLength
of the Cid
's Multihash
but declares it as a signed integer and employs a -1
signal that is not part of the multihash specification (which I personally had much trouble spotting the difference while working in https://github.com/ipfs/go-ipfs/issues/5019).
With this I don't want to undermine the Go implementation but (from what I'm understanding from the IPFS model) the specification should always precede the implementation (to avoid having go-cid
being the reference implementation and being just one implementation).
FWIW, the JS implementation of the prefix
is worth looking at, where it is just a derivation (as a function, not a separate structure) of the CID.
So I think the priority is to first decide at the specification level what role does the prefix play so the user has a clear mental model of it, and later discuss how the code should implement it (if at all) to avoid potential sources of confusion like
I strongly disagree with this. The specification is about the format, not the API. Prefix is an implementation detail.
If part of the confusion is that the Prefix
structure has a Bytes()
method I think we should just remove it as I am not sure how much utility it has.
And maybe we should just eliminate the use of the word Prefix
for this.
I like using CidInfo
for a struct that is just about the information in a Cid (and doesn't have a Bytes() method) and Format
for a struct used to create new Cid's. This way the user doesn't try to match up the word Prefix
which how it is used in the spec.
The specification is about the format, not the API. Prefix is an implementation detail.
That's fair, we may have different expectations of what a specification does. In my experience specifications (e.g., RFCs, what IPFS specs aspire to) tell me more, not just the format but the functions that interact with it, the main components involved, how they interact, the terminology they use, etc. They don't draw the blueprints down to the last level because that's were implementation specific details come up but it's certainly not a black box in terms of what to expect from it.
Anyway, thanks for the feedback, this is an interesting discussion (that exceeds this issue) about what to expect from IPFS specifications (I'll try to submit an issue at the specs repo about it).
Currently,
Prefix.MhLength
is anint
and-1
can be (and is) used to mean "default length". Unfortunately, this means:cid1.Bytes() == cid2.Bytes()
does not implycid1.Prefix() == cid2.Prefix()
.Prefix.Bytes()
is broken.Solutions:
func V1Prefix(codec uint64, mhType uint64) Prefix
). This will break things.Prefix.Bytes()
and provide anEquals
method (less convenient in the long run).Thoughts?