Codecs for ENS Contenthash: URI [0xF2] and Data URL [0xF3]

ENS (Ethereum Name Service) encodes contenthash() using multicodec. The purpose of a contenthash() is to describe the web contents for a corresponding ENS name.

Currently, ENS supports IPFS, IPNS, Swarm, Arweave, Onion, etc.

Example using IPFS:

ENS: vitalik.eth
ENS Web Gateway using Contenthash: https://vitalik.eth.limo/
contenthash() = 0xe30101701220484da2f7f497cac307e2026282263630b8dd4c448c3436470f5b850b432ba868
Decoded: ipfs://k2jmtxt5zh5vu5y8r7em2che3d4ghyftfr6h1yofdhibxai88k1wj5uw
0xE3 correspond to multicodec ipfs, the following bytes are a CIDv0

We would like to support the following (2) new codecs:

0xF2 — URI
- Encoded: 0xf268747470733a2f2f656e732e646f6d61696e732f
- Format: <codec><uri: utf8-string>
- Decoded: https://ens.domains/
0xF3 — Data URL
- Encoded: 0xf309746578742f68746d6c3c68746d6c3e68656c6c6f3c2f68746d6c3e
- Format: <codec><len(mime): uint8><mime: ascii-string><data: uint8[]>
- Decoded:
  - mime = text/html (9ch)
  - data = <html>hello</html> (encoding depends on mime)

I think this seems reasonable, though novel. I'm not so sure about introducing a new tag, data for this though. Would namespace as well for that be OK? Even that doesn't map super cleanly onto what you're doing here.

Do you think you'll want more of these into the future? I wonder if we can't figure out a better tag whether this should just be an entirely new classification.

@vmx, what do you think?

I wonder if URI could use a Multiaddress instead. Would that be an option (I know to little about the Eth/ENS ecosystem).

namespace works. I'd be happy to change it to whatever you suggest.

IMO, the closest codec is json which oddly uses tag:ipld.

I picked tag:data as unlike most codecs, data-uri is both a codec and the data itself.

I think tag:multiaddr for uri suggests too much internal encoding, as we want something maximally general (a literal UTF-8 string) where the content is ultimately validated by the client (since URL standards are ever-evolving)

I think tag:multiaddr for uri suggests too much internal encoding, as we want something maximally general (a literal UTF-8 string) where the content is ultimately validated by the client (since URL standards are ever-evolving)

Keeping it simple makes sense.

Apologies for the long text, I'm going to be OOO for a couple days and wanted to make sure to leave some context. cc @lidel who has been involved in the ENS work and interop here since long before me 😅.

TLDR:

As per usual, unless a codec makes very little sense, is duplicative, or seems to trivially open the door for a whole bunch more codecs I'm generally +1 on applications - although sometimes I recommend moving to a higher byte range in the table
DataURL in particular seems like something the ENS and IPFS communities could work on, if the mime-type issue is causing them big enough problems that identity raw CIDs are insufficient then it seems like that'll happen for data that's too big to reasonably use a DataURL for.

Some thoughts:

URI

I wonder if URI could use a Multiaddress instead

Probably not multiaddress itself, but harmonization with something like multipath https://github.com/multiformats/multiformats/pull/55 would likely make this work and be pretty sensible. It would likely also let us use the 0x2f as an escape hatch for people generally wanting to use/experiment with strings rather than code numbers which is what this roughly does (otherwise, the codes like for http could potentially be used instead).

FWIW libp2p has recently proposed going the other way as well (i.e. representing multiaddrs as URIs https://github.com/multiformats/multiaddr/pull/171).

I don't in principle have an objection to a URI based namespace, the two byte range is probably fine although URIs could probably tolerate even three due to the size of the data.

Perhaps more of an ENS-related comment, but want to call out:

There is some redundancy here because for any namespace (IPFS, Swarm, etc.) you could encode under the URI namespace or under their individual namespace. Not necessarily a big issue here, but certainly a change implementations will need to take care of
Related to ^ it seems like this could have always been the case, I'm not sure the historical context here but probably worth validating with folks who did this in earlier rounds that this makes sense. Totally fair + reasonable to say we want to save some bytes with known namespaces and then have the utf-8 URI escape hatch (although I don't know if "contenthash" is a reasonable name for this kind of thing 🙃).

Data URL

Seems fine, although maybe the three byte range (along with arweave, skynet, etc.) makes more sense here given these will likely be larger anyhow.

A few comments / thoughts:

Given the above technically this already works as a Data URI, right https://developer.mozilla.org/en-US/docs/Web/HTTP/Basics_of_HTTP/Data_URLs, right? If so, I assume the idea is to preserve space by not needing to do base64 encoding.
While saving some bytes here seems fine. This seems non-optimal in that it both isn't as compact as it could be (e.g. mime types are still expressed in text), not flexible enough to include any other metadata, and we couldn't work around it within the existing namespaces (the URI namespace adds a sort of escape hatch here as long as you assume names won't collide).
- In my bias as someone who works on the IPFS project IMO this could've/should've been resolved by having the tooling for this either in IPFS (either in UnixFS, CID, or another IPLD format), and this seems like as good a time as any to resolve it independently of what happens in this PR (although it may justify bumping to 3 bytes)
  - A CID with the identity multihash and raw codec (or sometimes codecs like JSON or CBOR) would've been sufficient except for the need of a mimetype
  - Technically this could be resolved in a few different ways, one is https://github.com/ipfs/specs/issues/257, note: the latest request here came from the ENS community as well so definitely seems like a good opportunity to chat anyhow
  - Given the very large number of ENS contenthash records that are IPFS-based this seems like something we could/should fix or the hack within ENS (whether in ENS or the "contenthash" namespace could fix either)
  - I understand this isn't really the place for an ENSIP comment and with my "multiformats hat" I don't have objection, but if you want to chat would definitely be happy to

IMO, the closest codec is json which oddly uses tag:ipld.

everything is IPLD 😄

🙏 everyone, I'm one of author of that data:uri ENSIP draft proposal, https://discuss.ens.domains/t/draft-ensip-17-datauri-format-in-contenthash/18048 using simple namespace hex("data:") format.

We did our homework before sending draft over ENS forum to make an exception for hex("data:") prefix for reasons below..

a) mime/content type support in cidv1 is pending for loong time (?wen cidv2?)

https://github.com/multiformats/multicodec/pull/159 https://github.com/multiformats/multicodec/issues/4

b) ENS already supports string(data:uri) format in avatar records, so contenthash with plaintext bytes(data:uri) as hex("data:") namespace is full RFC2397 & it won't collide with cidv1 namespaces. https://datatracker.ietf.org/doc/html/rfc2397

if(contenthash.startsWith("e301")){
    //ipfs
} else if(contenthash.startsWith("e501")){
    //ipns
}
// else... other contenthash namespaces...
else if(contenthash.startsWith(hex("data:"))){
    //datauri
}

ENS is not ready for such changes with new ENSIP specs, all contenthash MUST follow namespace+CIDv1 format. && we're back to square one, using raw data in cidv1 with IPFS namespace.

our current working specs for on-chain raw IPFS+CIDv1 generator without content/mime types..

import { encode, decode } from "@ensdomains/content-hash";
import { CID } from 'multiformats/cid'
import { identity } from 'multiformats/hashes/identity'
//import * as cbor from '@ipld/dag-cbor'
import * as json from 'multiformats/codecs/json'
import * as raw from 'multiformats/codecs/raw'
const utf8 = new TextEncoder()

const json_data = {"hello":"world"}
const json_cid = CID.create(1, json.code, identity.digest(json.encode(json_data)))

JSON/cidv1 >> 01800400117b2268656c6c6f223a22776f726c64227d https://ipfs.io/ipfs/bagaaiaarpmrgqzlmnrxseorco5xxe3deej6q
ENS contenthash with IPFS namespace : 0xe30101800400117b2268656c6c6f223a22776f726c64227d eth.limo tests : https://e3010180040011.7b2268656c6c6f223a22776f726c64227d.ipfs2.eth.limo https://bagaaiaarpmrgqzlmnrxseorco5xxe3deej6q.ipfs2.eth.limo/

const html_data = "<h1>Hello World</h1>";
const html_cid = CID.create(1, raw.code, identity.digest(utf8.encode(html_data)))

HTML/cidv1 >> 015500143c68313e48656c6c6f20576f726c643c2f68313e https://ipfs.io/ipfs/bafkqafb4nayt4sdfnrwg6icxn5zgyzb4f5udcpq
ENS contenthash with IPFS namespace : 0xe301015500143c68313e48656c6c6f20576f726c643c2f68313e eth.limo tests : https://e30101550014.3c68313e48656c6c6f20576f726c643c2f68313e.ipfs2.eth.limo/ https://bafkqafb4nayt4sdfnrwg6icxn5zgyzb4f5udcpq.ipfs2.eth.limo/

This all works ok using json/raw data.. only down side, there's no content/type in CIDv1 so we've to parse/guess magic bytes in raw data on client side OR request ipfs gateways to resolve that.

we can even use dag-cbor to link multiple files/ipfs cids.. but on public ipfs gateways there's no index file and ipfs __redirect supported. we've to happily decode that on our "smart" clients for now.

const blog = CID.parse("bafybeidnycldkehcy6xixzqg72vad6pitav4lk5np3ev6tr6titlkvfpvi")
let link = { json: json_cid, "/": html_cid, "index.html": html_cid, blog: blog }
let cbor_link = CID.create(1, cbor.code, identity.digest(cbor.encode(link)))

Back to @adraffy's f3 namespace, I'd suggest this format..

const data_uri = "data:text/html,<html>hello</html>";
const data_cid = CID.create(1, raw.code, identity.digest(utf8.encode(data_uri)))

RAW CIDv1 with full data uri string : 01550021646174613a746578742f68746d6c2c3c68746d6c3e68656c6c6f3c2f68746d6c3e

01 - 55 - 00 - 21 - 646174613a746578742f68746d6c2c3c68746d6c3e68656c6c6f3c2f68746d6c3e v1 - codec/raw - hash/none - varint.encode(datauri.length) - utf8 datauri https://ipfs.io/ipfs/bafkqailemf2gcotumv4hil3iorwwylb4nb2g23b6nbswy3dphqxwq5dnnq7a ENS contenthash with data-uri "f3" namespace : 0xf30101550021646174613a746578742f68746d6c2c3c68746d6c3e68656c6c6f3c2f68746d6c3e

@aschmahmann and @0xc0de4c0ffee thanks for the feedback.

As for codec numbers, I'd be happy with any assignment. Initially picked lower numbers since these two codecs seem useful beyond ENS.

Yes, you could put both ipfs://... and data:... into uri however there is a difference w/r/t how they are handled and interpreted. These details were not included as they are ENS application-specific, but possibly the codec names should reflect that, eg. Redirect URI.

From the ENS + web content perspective:

the intention of ipfs is that the content is on IPFS and the server would know how to decipher the CID and serve directory-like dags from a single root hash using whatever IPFS gateway (likely their own node) to fetch the content
the intention of url is that the server would blindly HTTP 307 with no processing
- for ENS/identity, the original address (https://raffy.eth.limo/) would disappear
- for many browsers, ipfs: would fail without a specific handler for that scheme
- https://ipfs.io/ipfs/... would work but force an explicit gateway
- typical use-case: redirect to an existing web2 website
- alternative use-case: redirect to a custom URL scheme, eg. itms-apps:, spotify:, etc.
- inefficient but valid use-case: redirect to an (base64-encoded) inline asset, eg. an image
the intention of data-uri is that the server would serve the content as a static file
- for ENS/identity, the original URL would be preserved as well as the path/query/fragment
- eg. text/html with an embedded <script> can parse the window.location
- eg. application/pdf with #page=7 can jump

You are correct about the base64 overhead concern, but there is also URL length limits (vs body)

Coffee, I put your response on ENS forum

multiformats / multicodec

Codecs for ENS Contenthash: URI [0xF2] and Data URL [0xF3] #353

URI

Data URL