Propose `httpath` as an official extension to `multiaddr` format

masih commented 1 year ago

httpath is used by IPNI to allow additional path in a HTTP URL expressed in Multiaddr format. Propose it as an official extension to Multiaddr and move the implementation to relevant repos in libp2p / multibase orgs.

rvagg commented 1 year ago

I've proposed 0x01e1: https://github.com/multiformats/multicodec/pull/324

It's currently using a "private" code

Codes in this range are reserved for internal use by applications and will never be assigned any meaning as part of the Multicodec specification.

I guess there's a decision to be had here about how you handle a transition period and what kind of period that is. Do you perpetually support 0x300200 because it's in the wild now? Or do you write that off as "experimental" and not support it going forward—I'd +1 this approach cause I don't think this is even documented yet, but once https://github.com/ipni/specs/pull/18 does live then you have documented it.

masih commented 1 year ago

Thank you for getting this sorted @rvagg 🚀 My vote is also to break things early. I don't think it's widely used out there.

WDYT @gammazero ?

MarcoPolo commented 1 year ago

I would avoid using httpath as part of the multiaddr. I've thought a lot about this, and I've even implemented the percentencode https://github.com/multiformats/go-multiaddr/pull/193. But what I realized was that if you have application logic(/capability info/protocol details) in the multiaddr then you'll have a lot of duplicated information in the multiaddr.

Take for example the multiaddr: /dns/ipni-provider.example/https/httpath/%2Fmy%2Fprefix, what if it also provided an ip address to use to avoid DNS resolution? Then we'd have to duplicate this information with /ip4/1.2.3.4/tcp/1234/https/httpath/%2Fmy%2Fprefix.

Now, what happens when a node advertises a libp2p QUIC address as well? e.g. /dns/ipni-provider.example/udp/1234/quic-v1. Where does the httpath go? /dns/ipni-provider.example/udp/1234/quic-v1/http doesn't make sense because HTTP is not the transport here, QUIC is. /dns/ipni-provider.example/udp/1234/quic-v1/httpath/... doesn't make sense either unless httpath is specific for only IPNI protocols. See Interpreting Multiaddrs. Really this should be something like /dns/ipni-provider.example/udp/1234/quic-v1/ipni-v1/httpath/.... Although now you have the same issue as before, we need to append /ipni-v1/httpath/... to every multiaddr.

I think there are two separate things we're trying to solve:

How do we reach this node on the open internet? This is the role of the multiaddr. It's like directions to find the node.
What is this node capable of? This is where you learn that a node can speak ipni/v1 and maybe that it's preferred prefix is foo/bar.

Trying to do both things in a Multiaddr seems to introduce an MxN problem when you really only need an M+N solution.

willscott commented 1 year ago

I think of the http prefix as part of the "where is this thing" - that if i have a middle box translating the mount point where my endpoint is in a sub-directory rather than the root of a web server, i now have this http-level path semantic to follow.

I think it isn't unreasonable to argue that this parallels the type of path following we already allow when we use multiaddrs of the form /ip4/.../p2p/QmR/p2p-circuit/p2p/QmA to follow proxying through another node.

I don't think we really want to say "you can't pass around a multiaddr as a full address" - to need both a multiaddr and some other struct indicating where ipni is seems unwieldy, and would mean we re-spec how to follow http paths in each HTTP-compatible protocol.

You are right that we are using different semantics in this context of multiaddr, and i think that's maybe okay? this isn't the multiaddr of the node. that would indeed be /dns/ipni-provider.example/udp/1234/quic-v1 rather this is the multiaddr of the IPNI service. We're passing these around in a service specific context, and know how to differentiate that service multiaddr from the overall node (we call them "publisher") multiaddr.

I would intuitively think of it as /httpath/.../ipni-v1 here because the ipni semantics exist at that location in the webserver. I wonder if part of the reason we find this awkward is because we don't have HTTP semantics in the libp2p stack - if i could translate a multiaddr and into an http.Client to make requests, that would be a layer above the specific application semantics that could prefix requests with the base httpath.

MarcoPolo commented 1 year ago

I think of the http prefix as part of the "where is this thing"

Yes. I agree, I'm saying this is an application level concern rather than a node-routing concern (where do I find this node?) and multiaddrs have primarily been concerned with how do I get to the node rather than "how do I ask the node for a specific thing".

I think it isn't unreasonable to argue that this parallels the type of path following we already allow when we use multiaddrs of the form /ip4/.../p2p/QmR/p2p-circuit/p2p/QmA to follow proxying through another node.

p2p-circuit still tells you how to find the node. It doesn't have the MxN problem because we only support one relay. If you look at webrtc, we end up with /ip4/.../p2p/QmR/p2p-circuit/p2p/QmA and /ip4/.../p2p/QmR/p2p-circuit/p2p/QmA/webrtc for each relay address. Here is where we need the capabilities info so that webrtc is marked as a capability of the node rather than a separate multiaddr. i.e. This node supports a direct webrtc connection if you could bootstrap it with some other connection.

We're passing these around in a service specific context, and know how to differentiate that service multiaddr from the overall node (we call them "publisher") multiaddr.

Ah okay. So this is used in a specific application context. I think that makes this better. I would push for trying to make it clearer somehow this isn't the same as the node's multiaddr. Maybe skip the part that describes how to reach the node? That would solve the MxN problem here as well. Something like /ipni-v1/httpath/... (The httpath acting as a parameter to the ipni-v1 component, which is the protocol being used).

I wonder if part of the reason we find this awkward is because we don't have HTTP semantics in the libp2p stack - if i could translate a multiaddr and into an http.Client to make requests, that would be a layer above the specific application semantics that could prefix requests with the base httpath.

I think so. I'm pushing on this this week so expect an update in libp2p/specs. That's also why I'm trying to caution against adding the protocol specific stuff to the node's multiaddr since soon you'll be able to speak HTTP on every multiaddr, so having this only on multiaddrs that end in /http won't make sense.

There's also going to be a built-in way of specifying custom mappings for protocols. Libp2p will know if the ipni-v1 protocol is actually mounted on /my-prefix or .libp2p/ipni-v1 or somewhere else. I think that should solve the prefix use case?

lidel commented 1 year ago

+1 on being careful with leaking HTTP semantics onto protocol-agnostic Multiaddr spec. This may come with a risk of ballooning the number of multiaddr codes we will be asked to register and support in the future.

After all, paths are not special, it is only one of many attributes that may impact HTTP endpoint's behavior. If we allow HTTP paths, the unanswered question is: how one adds basic auth or bearer token headers? or a cookie? or an Accept header for content type version negotiation?

There are prior discussions in

which.. went nowhere. So far, every time someone tries to clean this up in a generic way, they read the above threads, see the interop problems and the spec work that needs to happen, and end up using plain old https:// URLs :upside_down_face:

Adding a top level /httpath may be enough to solve 80% of HTTP addressing needs, but is it worth the cost when the option to use https:// URLs exist?

I've seen how much trouble new codes create across the stack, I am unsure if it is net positive. Once the code exists, people will start using it, leaking more HTTP onto multiaddrs, and then they will ask for support in non-IPNI cases, which creates even more maintenance work across libraries and apps.

We're passing these around in a service specific context, and know how to differentiate that service multiaddr from the overall node (we call them "publisher") multiaddr.

Using Multiaddr for representing a specific path location of non-libp2p HTTP service feels very odd.

It was probably discussed before, but if this is specific to IPNI, and in controlled environment, then why not use https:// URL? What is the benefit of using Multiaddr in your case? Compact binary representation? The path will be a string anyway.

MarcoPolo commented 1 year ago

I've written down what I believe are the libp2p stack's expectations of the /http component: https://github.com/libp2p/specs/pull/550

willscott commented 1 year ago

why not use https:// URL? What is the benefit of using Multiaddr in your case?

we support both an HTTP and a libp2p fetch of data. we have a single multiaddr currently that signals if the fetch is over graphsync vs over http.

MarcoPolo commented 1 year ago

I changed some of my thoughts here.

/httpath (or /httppath, my preference) is a reasonable component to describe HTTP Resources.
- This works on top any transport (including, but not limited to /http). e.g. /ip4/1.2.3.4/udp/4001/quic-v1/httpath/foo%2fbar is valid.
- To @lidel's point, this isn't a slippery slope of adding all HTTP request semantics to multiaddrs. This is only scoped to identify resources, not how requests are made. This is inline with RFC 9110:
  
  One design goal of HTTP is to separate resource identification from request semantics

I still think this opens the door to an MxN combination of M multiaddrs and N resources, but that's something application authors can work around or accept (Maybe they have a single dnsaddr).

The things that changed my mind are:

HTTP resources are independent of request semantics and any specific user protocol.
Seeing many folks wanting to address HTTP resources in multiaddrs.
HTTP resources are part of HTTP semantics which can be defined on top of any transport, so it's okay to ask for an HTTP resource on top of a libp2p stream (by running HTTP over a libp2p stream).

gammazero commented 4 months ago

This is now an official part of multiaddr: https://github.com/multiformats/go-multiaddr/pull/246

ipni / go-libipni

Propose `httpath` as an official extension to `multiaddr` format #42