w3c-ccg / multihash

An IETF Internet Draft for the Multihash data format
https://w3c-ccg.github.io/multihash/index.xml
Other
10 stars 8 forks source link

multihash: URI scheme #20

Open melvincarvalho opened 1 year ago

melvincarvalho commented 1 year ago

Initially raised in this thread:

https://lists.w3.org/Archives/Public/public-credentials/2022Aug/0025.html

I wonder if there would be value in a multihash: URI scheme

I'm just checking what it would look like. I would assume multihash:// rather than mh:// which seems not as descriptive

Reason for raising this is that I'd like to test an implementation. And if we can loosely agree on what a scheme might look like, it would help with future proofing.

Appreciate the authors are busy with other things, so I'll start implementing on the assumption that there's no objection to multihash://. However if you had a moment to reply or thumbs up (or down!) this comment, that would be helpful.

msporny commented 1 year ago

The argument we had against multihash when it was first proposed was that these values are expected to be used in constrained environments and that they'd rather have mh instead of multihash... to save bytes... and no, I'm not kidding... :). There are a significant number of registered IANA schemes that use 2-3 characters -- irc, im, tel, sip, and most notably, ni. It was difficult to argue against it at the time. I wonder if we can reserve two schemes names for the same scheme, get both "mh" and "multihash"?

melvincarvalho commented 1 year ago

Totally fine with mh, I'd like to save bytes too!

ChristopherA commented 1 year ago

I don’t like multi* as it is yet one more encoding & tagging format, and requires you to register with a non-standards based org. I also feel architecturally the tagging happens in the wrong layer, it should be tagged in binary. Thus I prefer CBOR to avoid the whole problem.

Since CBOR is the heart of all of our objects, when we need to encode optimally for text or QR codes (which base64 and hex do not work well, see https://github.com/BlockchainCommons/Research/blob/master/papers/bcr-2020-003-uri-binary-compatibility.md) we can do so. Or not, and encode to some other format without baggage of knowing tags at that layer.

melvincarvalho commented 1 year ago

Hi @ChristopherA. Thanks for sharing the pointers, I enjoyed the technical information in the video.

So it seems that the CBOR structure you use can similarly encode hashes (and other things) in a QR code friendly way. It seems to have good technical utility. I assume that together with the ur: URI scheme it would be a drop in replacement for an mh: URI scheme or similar.

It did strike me that multihash seems quite focused on hashing content, and UR seemed to be more generic and for many types of structured data up to 3k, which is pretty cool!

I think what differs is that there is an existing deployment base, and network effect around multihash. I'm not familiar with how well deployed UR is. So the motivation of this issue was to address a gap in the multihash / multibase / multiformats eco system that would enable to allow it to operate more easily in mixed URI scenarios.

In this case, both ur and mh could be used together by software with each with different trade-offs.

So, I think it still makes sense to address the multihash gap (lack of stable URI scheme) though some implementations might prefer ur, as you point out.

IS4Code commented 1 year ago

I wonder: would something like mf:<multibase><multicodec><multihash> be better suited to have as a broad URI for everything representable with multiformats? There would be no need for separate URI schemes for e.g. multihash and multiaddr, and it would be forward compatible.

melvincarvalho commented 1 year ago

@IllidanS4 that could work

Useful site here: https://cid.ipfs.tech/#QmXv7TpGgdhyjp7L1QkSFNjdq9zwoG1YHi4nykYt16Ve1r

So what seems missing is a version number there (CIDv0 and CIDv1)

Ideally could we find something which is a sweet spot:

  1. relatively stable
  2. compatible with an existing deployment base, in particular, CIDv0
  3. simple, lacking bloat, so that we don't need to define lots of documents and schemes

It's treading a fine line between being being too generic and too specific

mf: mb: mh: could all work. It could be something similar to ipfs but not necessarily closely coupled to the system.

melvincarvalho commented 1 year ago

So reading through the various threads and reading:

https://www.ietf.org/id/draft-multiformats-multibase-05.html

Note: This draft expires Aug 05 (today) ^^

I would suggest a scheme that perhaps everyone could be happy with:

mh://

Where the hash is a multihash and by default is base58 encoded to be compatible with existing deployments in IPFS CIDv0 etc.

This has similarities to the ni:// naming scheme structure, and can be aligned with IPFS implementations, but not closely coupled to it.

Furthermore, useful bits of RFC 6920 and could be reused e.g. the .well-known location

https://www.rfc-editor.org/rfc/rfc6920.html#section-4

In this way something that is already a multihash could be used with mixed URI software, and have some points of extension, such as to git hashes, DIDs etc.

Thoughts?

peacekeeper commented 1 year ago

What would be the purpose of a mh:// URI scheme? I'm familiar with multihash in certain DID and VC related JSON-LD property values, and in other URI schemes such as ipfs://, but I haven't seen an explanation in this issue how a generic mh:// would work and how it would be used.

What's the media type of the representation you get when you dereference a mh:// URI?

My assumption is that it would work in a similar way as data://, but it would be nice the see this explained somewhere.

melvincarvalho commented 1 year ago

@peacekeeper mh:// is a naming system like magnet: or ni:// but

ipfs:// is a bigger system that involves a P2P network to locate and distribute such content, using a DHT. I believe there are some mechanisms for ipfs to store the mime type, in general application/octet-stream is used when the mime type is not known

You might not want to use ipfs and have it closely coupled with the hash naming system

I'm primarily interested in what the naming will look look so that I can write code and not have to change it later. The key questions are:

Other details I do not have a strong view on, and can be decided with consensus. This issue is about gathering thoughts before putting together a quick spec. As it's been open 6 months now there's been time for folks to weigh in, but also the new IETF group may have some thoughts, so perhaps best to wait till the summer to decide

peacekeeper commented 1 year ago

Thanks @melvincarvalho for the explanation, makes sense!