multiformats / multiaddr

Composable and future-proof network addresses
https://multiformats.io/multiaddr
MIT License
424 stars 85 forks source link

EIP1577 - Multiaddr support for ENS #79

Closed daviddias closed 5 years ago

daviddias commented 5 years ago

Conversation happening at https://ethereum-magicians.org/t/eip1577-multiaddr-support-for-ens/1969

raulk commented 5 years ago

Was reviewing EIP-1577. My comments:

  1. I sense that multicodec and multiaddr are being mixed in a way I don't quite follow.
  2. I was confused by the separate ensdomains/multicodec repo.
  3. The EIP says:

The encoding of the value depends on the content type specified by the protoCode; for instance, types in the range 0x00-0xf0 are encoded using multihash, meaning their value is formatted as follows [...]

I am assuming they refer to the ensdomains/multicodec codelist, but I don't find anything that points to specific ranges being reserved for specific purposes.

  1. In the fallback section, it's not clear what "the multiaddr interface" refers to. The spec talks about multicodec and that's the first time that multiaddr is mentioned.

During my research process I noticed our prose concerning the interrelationship of multicodec, multiaddr and multihash could be a bit off. Or at least it feels imprecise (to this reader).

For example, the multicodec README says:

A chunk of data identified by multicodec will look like this:

<multicodec><encoded-data> (1)

It is worth noting that multicodec works very well in conjunction with multihash and multiaddr, as you can prefix those values with a multicodec to tell what they are.

Then, in the multihash README, the values are described as:

<varint hash function code><varint digest size in bytes><hash function output> (2)

However, based on my understanding:

<multicodec> (1) == <varint hash function code> (2)
<encoded-data> (1) == <varint digest size in bytes><hash function output> (2)

If my interpretation is correct, wouldn't it be more precise to state that multiaddr and multihash embed multicodec codepoints? The term "prefix" is misleading.

Stebalien commented 5 years ago

I sense that multicodec and multiaddr are being mixed in a way I don't quite follow.

I believe the confusion stems from the fact that multiaddrs are paths and /ipfs/Qm..., /ipns/Qm... are paths (/ipfs/Qm... is even a valid multiaddr). Worse, multiaddr has the word "address" in it but we don't use them to address content.

If my interpretation is correct, wouldn't it be more precise to state that multiaddr and multihash embed multicodec codepoints? The term "prefix" is misleading.

I'm not sure I follow. We say "prefix" because a mulithash is <varint codec><length><data> and a multiaddr is <varint multicodec><stuff>. In both cases, we prefix some data with a multicodec to create either a multihash or a multiaddr.


Note: this also came up here: https://github.com/multiformats/multiaddr/issues/73

raulk commented 5 years ago

In both cases, we prefix some data with a multicodec to create either a multihash or a multiaddr.

@Stebalien I'm reading the README as a spec -- maybe I shouldn't. But with those lenses, if multihash is defined <varint codec><length><data> and multiaddr as <varint multicodec><stuff>, multicodec does not prefix multihash and multiaddr, multicodec is the prefix. Personally I tripped over that, thinking that multicodec somehow encapsulates the other two.

In the case of multiaddr, given that the <varint multicodec><stuff> groups are repetitive, saying that codec is the prefix falls short. Yes, all composed multiaddrs will start with a multicodec, but there will be more instances in a single composed multiaddr, right?

Stebalien commented 5 years ago

Ah, I see. You're right, we don't prefix the multihash, we prefix the length-delimited hash digest.

Yes, all composed multiaddrs will start with a multicodec, but there will be more instances in a single composed multiaddr, right?

So... this is one of the issues with multiaddrs. Without understanding the what each protocol code means, I can't break a multiaddr into a sequence of (codec, value) pairs. Really, it's better to think about multiaddrs as being recursively defined. That is, to parse a binary multiaddr, you:

  1. Read off the multicodec.
  2. Lookup the protocol definition for that codec.
  3. Pass everything else to that codec.
  4. The codec should consume as much as it wants.
  5. The codec should then recursively parse everything it doesn't want as a multiaddr.

Really, this also applies to strings as a single multiaddr "component" could either be:

raulk commented 5 years ago

@Stebalien have we considered introducing boundary markers for values? This has the ability to solve the nesting and the value identification problem at once. Using square brackets as the textual representation:

Stebalien commented 5 years ago

We've discussed it for embedding path values (/unix/[/...]/...) but not in other cases.

The real issue with brackets is that they kind of break the path abstraction (coming from the plan9 "everything lives in the filesystem namespace" camp).

If you're interested in some ramblings on this subject... https://gist.github.com/4764975c3b5ea33202324d8e9ec0985d (not posting a PR/issue as I don't want to add too much confusion to the discussion unless we decide to go with it).

(note: defining these recursively is really just my opinion; by spec, multiaddrs are still defined as a list).

ghost commented 5 years ago

They're just linking IPFS content in EIP-1577, and the same earlier in EIP-1062. So the answer is easy, they should either:

They're doing basically the ENS version of dnslink.

Both of EIP-1577 and EIP-1062 have been merged as a draft on the standards track, so I'm not sure whether they can be corrected or whether there needs to be a new EIP draft. @raulk you spent a ton of time in Ethereum, what do you think?


As for multiaddr, there's nothing to do about these EIPs. I'll pull your thoughts above into a new design discussion.

ghost commented 5 years ago

For the record, links for the EIPs:

Stebalien commented 5 years ago

We've gone with a multicodec. That is, we've defined swarm-ns, ipld-ns, and ipfs-nsmulticodecs to be used in "compressed" paths of the form...`, etc. Unfortunately, they really didn't want to use text paths (we tried).

ghost commented 5 years ago

@Stebalien was that in an EIP too?

ghost commented 5 years ago

Ooh - I see. EIP-1577 defined the multicodec usage, and the *-ns codecs are in multiformats/multicodec.

I suppose byte length matters regarding gas cost.

ghost commented 5 years ago

It's still unfortunate that EIP-1577 has multiaddr in its title...

ghost commented 5 years ago

Nevermind - I was reading the first version of the draft only. The current version at https://eips.ethereum.org/EIPS/eip-1577 doesn't mention multiaddr at all.

All good!

pldespaigne commented 5 years ago

For everyone wanting to use EIP1577 there is now a js implementation here : content-hash