multiformats / cid

Self-describing content-addressed identifiers for distributed systems
Other
426 stars 79 forks source link

Implementations: should a v0 be equal to v1 #41

Closed vmx closed 4 years ago

vmx commented 4 years ago

This issue was raised in rust-cid, but is a general topic for all implementations.

If you convert a V0 to a V1 CID and then compare those two with an "equal" operation, should it return true?

If we want to make it return true that would mean for specific languages:

I think making this change makes sense, as then "equal" CIDs would be the ones that point to the same content.

aschmahmann commented 4 years ago

I'd need to do a little thinking about other ramifications, but my gut reaction is that even without judging if this is a good idea or not this is a breaking change to js and Go CID libraries which could lead to a bunch of ecosystem breakage for pretty minimal gain.

IMO In order to show this is worth it we'd have to see some evidence of people already using cid.Equals incorrectly (i.e. assuming it works as the proposal suggests) since that would mean we're actually making things easier to understand for people instead of just forcing tricky library migrations and/or introducing subtle bugs to existing applications and libraries.

Note: we could always make another function has the newly proposed behavior (e.g. Equivalent, DataEquals, or something better 😁)

Stebalien commented 4 years ago

This would be nice in some cases (would make GC simpler, for one...), but I think it would be an even bigger foot gun at this point. Using cid.String() as a key would work differently than just using cid.

I think making this change makes sense, as then "equal" CIDs would be the ones that point to the same content.

Note: this won't be the case regardless due to the codec itself. If content equality is desired, the multihash must be used directly.

vmx commented 4 years ago

I conclude that for "equals" for CIDs means "byte identical objects" => A CIDv0 is always != CIDv1.

If you want to not care about the version, you can always implement that yourself. Likely you want to compare only the Multihashes anyway, therefore you could make a comparison like cid0.hash() == cid1.hash().