Addressing HTTP servers over Unix domain sockets #577

rkjnsn commented 3 years ago

It is often desirable to run various HTTP servers that are only locally connectable. These could be local daemons that expose an HTTP API and/or web GUI, a local dev instance of a web server, et cetera.

For these use cases, using Unix domain sockets provides two major advantages over TCP on localhost:

  1. Namespacing. If two users on a system are running the same service, TCP requires them both to pick, configure, and remember different port numbers. With Unix domain sockets, each socket can live in the respective user's runtime directory and be named after the service.
  2. Access control. Even if the service is diligent only to bind to localhost, TCP still allows any (non-sandboxed) process or user on the machine to connect. Any access control has to be implemented by the service itself, which often involves implementing (hopefully with sufficient security) its own password authentication mechanism. Unix domain sockets, on the other hand, can take advantage of the access control functionality provided by the filesystem, and thus can easily be restricted to a single user or set of users. In the event that a service wants to allows multiple users to connect and discriminate between them, several operating systems provide a means of querying the UID of the connecting process, again without requiring it's own authentication scheme.

Indeed, due to these advantages, many servers/services already provide options for listening via a Unix domain socket rather a local TCP port. Unfortunately, there is not currently an agreed-upon way to address such a service in a URL. As a result, clients who choose to support it end up creating there own bespoke approach (e.g., a special command-line flag, or a custom URL format), while others choose not to support it so as not to bring their URL parsing out-of-spec (among other potential concerns).

Here are some of the various URL formats I've seen used or suggested:

annevk commented 3 years ago

It seems you don't need just addressing for this, but some kind of protocol as well. I recommend using https://wicg.io/ to see if there's interest to turn this into something more concrete.

rkjnsn commented 3 years ago

I'm not sure I understand why any additional protocol would be necessary. It's just HTTP over a stream socket. The server accepts connections and speaks HTTP just like it would for a TCP socket. Indeed, I can set up such a server today, and it works fine provided that the client provides a way to specify the socket, e.g., curl --unix-socket /path/to/socket.sock http://localhost/resource.

avakar commented 3 years ago

I don't even understand how this is not a thing yet. Especially now that Windows started supporting AF_UNIX sockets natively, it seems to be the best, cross-platform way to connect web and native apps without consuming a TCP port.

annevk commented 2 years ago

Let me take a step back, what exactly is the ask from the URL Standard here?

rkjnsn commented 2 years ago

The ask is for the URL standard to specify a syntax for referring to a page served via HTTP over a UNIX domain socket. Currently, applications that want to support connecting to an HTTP service have to pick from one of the following three:

  1. Provide a bespoke mechanism for specifying the server's socket outside of the URL, such as curl's --unix-socket command-line argument.
  2. Accept a custom URL format outside of the URL standard for addressing resources served via HTTP over UNIX domain socket.
  3. Forgo the functionality altogether if 1 is impractical and 2 is undesired.

None of these are ideal. Deciding on a standardized URL syntax allows different implementations to implement the functionality in a common, standards-compliant way.

annevk commented 2 years ago

I see, https://wicg.io/ is the place for that. The URL standard defines the generic syntax. If you want to define the syntax for a particular URL scheme as well as behavior, you would do that in something that builds upon the URL standard. E.g., https://fetch.spec.whatwg.org/#data-urls for data: URLs.

rkjnsn commented 2 years ago

Let me rephrase: the specific ask for the URL standard is to provide an allowance in the URL syntax for specifying a UNIX domain socket, either in lieu of the port (e.g., http://localhost:[/path/to/socket.sock]/resource) or in lieu of the hostname (e.g., http://[/path/to/socket.sock]/resource), both of which are currently invalid according to the URL standard.

annevk commented 2 years ago

I recommend using something like unix:/path/to/socket.sock?url=http://localhost/resource. We can't change the URL syntax for each new protocol that comes along.

cyanogilvie commented 2 years ago

It's the same protocol over a stream socket, just a different address (ie. authority part). Ok, so it's a different protocol in the sense of IP, but so are IPPROTO_IP and IPPROTO_IPV6, and the URL standard doesn't treat those as different. The relevant comparison I think are address families for stream sockets, like AF_INET, AF_INET6 and AF_UNIX. Once the stream socket has been established (as specified by the authority part of the URL), HTTP software shouldn't care or even know how the stream is transported.

Most invented, non-standard approaches for HTTP-over-unix-sockets seem to gravitate to something like a different scheme (since the authority part can't really be disambiguated from a hostname if relative socket paths are allowed from what I can see), like http+unix or https+unix, and then percent-encoding the socket into the authority part, and then everything works naturally from there from what I can see.

I've also seen (and used) enclosing the socket path in [] in the authority part and keeping the scheme as http or https, but I think that namespace clashes with IPv6 style numeric addresses like [::1]:80. RFC 3986 (in section 3.2.2) kind of leaves space for this by anticipating future formats within the [], and providing a version prefix to disambiguate them. Overall I like this approach the best (it extends into the error space so it doesn't change the interpretation of any valid existing URL, lives in an extension space envisioned by the standard, minimally extends just the appropriate part of the standard (authority part), keeps the schemes http and https to mean "this is a resource we talk to this authority using the http(s) protocol for", and so preserves compatibility for software that uses the scheme to know what protocol to speak with the authority over the socket.

annevk commented 2 years ago

Changing the syntax of URLs is not really something we're willing to do. That has a substantive cost on the overall ecosystem. The benefits would have to be tremendous.

michael-o commented 2 years ago

Syntax in mod_proxy:

In 2.4.7 and later, support for using a Unix Domain Socket is available by using a target which prepends unix:/path/lis.sock|. For example, to proxy HTTP and target the UDS at /home/www.socket, you would use unix:/home/www.socket|http://localhost/whatever/.

karwa commented 2 years ago

The strongest argument I can think of for this is: http(s) URLs have special parsing quirks which don't apply if the scheme is http+unix. So for a perfect 1:1 behaviour match, UDSs would need to use an actual http URL, not a custom scheme (similar to IP addresses).

That said, I'm also not a fan of adding yet another kind of host (file paths). My preference would be to use a combination of:


This is a perfectly valid HTTP URL, and should be capable of representing any HTTP request target.

Alternatively, you could try to get uds or socket as reserved TLDs, but I'm not sure how you'd go about doing that.

(Note: this would also mean that all UDS URLs have the same origin, although that could be remedied by adding a discriminator to the fake hostname to make your own zones of trust, e.g. 123.uds.localhost)

rkjnsn commented 2 years ago

I'm not sure using the fragment is really tenable for these use cases (and local web dev, especially). Many web applications use the fragment for their own purposes in JavaScript, whereas the host (at least it my experience) tends to be handled more opaquely.

What would be the main drawback for allowing additional characters within [] for the host portion of an HTTP URL?

karwa commented 2 years ago

Ah yes, you're right, it wouldn't work for local web development. I was thinking more about generic HTTP servers.

The main drawbacks IMO are:

cyanogilvie commented 2 years ago

Yes, I think the place for the UDS socket is in the authority portion - that's the bit that has the responsibility for describing the endpoint of the stream socket to talk to for this resource. Putting it elsewhere feels like an abuse and likely to cause unforeseen problems (HTTP client software will certainly have the host portion of the URL available in the portion of the code that establishes the stream socket, but may not have the fragment).

I think the namespace collision with IPv6 literals and syntax validation for UDS paths can be solved by:

It's up to the host to decode and translate the path into whatever native scheme that OS uses (just as it is for the path portion of the URI).

For me the motivation for supporting HTTP over UDS goes way beyond web browsers (and I would see that as a minor use case for this) - for better or worse HTTP has become a lingua franca protocol for anything that wants to communicate on the Internet (consider websockets for some of the forces that drive this), and that is increasingly machine to machine. For example: we run an online marketplace that serves about 10 million requests a day over HTTP (excluding static resources offloaded to a CDN), but each of those involve several HTTP interactions with other services to construct the response: Elasticsearch queries, S3 to fetch image sources that are resized, etc, a whole host of REST services for shipping estimates, geocoding, ratings and reviews, federated authentication providers etc. So, by volume, the overwhelming majority of HTTP requests our webservers are party to are between them and other servers, and aren't transporting web pages.

As the trend toward microservices and containerization continues this will only increase, and it's particularly there that I see HTTP-over-UDS being useful:

The other trend is for UIs to be implemented in HTML rather than some OS-native widget set (Android, iOS, GTK, QT, MacOS native controls, Windows native controls, etc), even when the application is entirely local on the user's device. There are very good reasons for this:

In this use case the hierarchical namespace issue is important and addresses a major downside to this pattern - choosing a port from the flat, system-wide shared namespace (ok, so the listening socket can specify 0 and have the OS pick a random unused port on some systems, but that's a bit ugly). Much nicer to use ~/.sockets/<app>/<pid>, and more discoverable. Another reason to use UDS in this case is that the user for the client side of the socket can be obtained from the OS in a way that only trusts the OS, solving the other issue with this pattern - knowing which user we're interacting with. If these issues were solved by HTTP-over-UDS, do you think something like Prusaslicer would use that (HTML, Javascript, webGL) rather than wxWidgets for its UI portability requirements? That would make porting to mobile devices like tablets much easier too.

Finally, consider things like headless Chrome in an automated CI/CD pipeline - the software managing the tests being run on the deployment candidate version could start a number of headless chrome instances and run tests in parallel, easily addressing the websocket each provides with a UDS path like /tmp/chrome/<pid> rather than somehow managing port assignments.

The tech already exists to make these obvious next steps in application provisioning and inter-service communication happen (even Windows supports Local sockets aka UDS), and the scope of the change for existing HTTP client software should be small and of limited scope (URL parsing, name resolution and stream socket establishment steps) but it can't happen unless there is a standardised way to address these sockets.

annevk commented 2 years ago

What exactly is wrong with https://github.com/whatwg/url/issues/577#issuecomment-951616248? @karwa uds.localhost can resolve locally.

mnot commented 2 years ago

Alternatively, you could try to get uds or socket as reserved TLDs, but I'm not sure how you'd go about doing that.

You ask the IETF, just like .onion did. Admittedly, there are some politics involved, but it's possible, and this is a pretty clearly technical use case. The backstop would be to use a subdomain of .arpa.

Personally, I'd go with something like:


Yes, the escaping is ugly, but it's much cleaner than overloading IPV6 in URLs. Alternatively, you might be able to get away with:


agowa commented 2 years ago

@mnot any update on this? Was it implemented? Should this ticket be reopened? I'm also interested in this.

mnot commented 2 years ago

I just left a comment with some context; I don't know that anything else has happened.

thx1111 commented 2 years ago

I haven't read anything here that seems to justify breaking with the familiar pattern, "\://\/\" or injecting a lot of special characters into the URL, or mimicking an IPv6 address. The protocol is simply "http". The domain is right there in the name, "Unix Domain Socket". Like any other top level domain - net, com, org - the domain is simply "unix". I don't know any reason that a web browser application cannot parse the domain from a URL, recognize a nonstandard domain name, and invoke a special handler for a non-network socket. The difficulty seems to be in distinguishing the path to the socket from the path to the resource file.

The "HTTP with socket path as the port" option, above, makes the most sense. And since a special handler must already be invoked for this "unix domain", I expect that colons - ":" - can continue to be used as the "port" separator for the socket path.

Altogether, that suggests a straightforward URL, as in: "http://unix:/var/run/server/ht.socket:/path/to/resource.html".

Is there any reason that those repeating ":/" character sequences would pose a problem in a URL?

This approach would not impose any limitation on the use of ":" in the resource path name, since a "unix domain" must be followed by a socket path, and that path will always be delimited by ":/". Any subsequent colons must then be part of the resource path name.

And, of course, this URL format still supports specifying any arbitrary protocol, served through a unix domain socket. And there is nothing redundant or misleading in the URL, as would be the case with any format requiring the name "localhost" or involving special parameter passing.

michael-o commented 2 years ago


rkjnsn commented 2 years ago

@michael-o, that doesn't provide any means to specify the resource path, as it is putting the path to the socket where the resource path should go.

agowa commented 2 years ago

(@thx1111 ":/" would be a valid filename on jfs...)

I think there are only two acceptable solutions:

I'd prefer the first one.

michael-o commented 2 years ago

@michael-o, that doesn't provide any means to specify the resource path, as it is putting the path to the socket where the resource path should go.

True, my bad. One could use exclamation mark just like Java uses to address resources inside a JAR with a URL.

agowa commented 2 years ago

Exclamation mark would still be a valid char that could be within a path...

http://[v1.uds:/tmp/mysock]/foo/bar is probably really the best one that is unambiguous. Even though [] are valid characters 🤔


Ok, never mind...

rkjnsn commented 2 years ago

The double slash to separate the socket path from the resource path is an interesting idea. I think square brackets are fine, though. Having ] in a socket path is uncommon, so saying that any such character in the socket path needs to be escaped / URL encoded shouldn't be very onerous.

agowa commented 2 years ago

But then we'd need to have either the whole socket path URL encoded. Or also need to escape \ within the socket path. \ is also a valid character on non windows systems...

avakar commented 2 years ago

Altogether, that suggests a straightforward URL, as in: "http://unix:/var/run/server/ht.socket:/path/to/resource.html".

@thx1111 This does not conform to RFC 3986, port must be comprised of DIGITs.

thx1111 commented 2 years ago

Network Working Group RFC 3986 URI Generic Syntax January 2005 3.2.3. Port
The port subcomponent of authority is designated by an optional port number in decimal following the host and delimited from it by a single colon (":") character.

 port        = *DIGIT

@avakar Thanks for introducing a specific reference. Of course, we will keep in mind that the point of this conversation is to approach a recommendation for a specific revision, exactly to RFC 3986, https://www.rfc-editor.org/rfc/rfc3986.

Interesting, but maybe understandable, given the "Network" orientation of the Working Group, that the group utterly failed to consider application of the "URI Generic Syntax" to "AF_UNIX". Actually, for the Linux kernel, we see:

ADDRESS_FAMILIES(7) ... DESCRIPTION The domain argument of the socket(2) specifies a communication domain; this selects the protocol family which will be used for communication. These families are defined in <sys/socket.h>. The formats currently understood by the Linux kernel include: ...

There are 41 different address families listed there, which includes AF_INET, AF_INET6, and AF_UNIX/AF_LOCAL. It also includes AF_BLUETOOTH, which also lacks a standardized URI syntax, as far as I know.

In particular, though, RFC 8820, URI Design and Ownership, 2020, https://datatracker.ietf.org/doc/html/rfc8820, updates 3986 and addresses the issue of updates to the URI scheme:

[RFC 8820] 1.1. Intended Audience This document's guidelines and requirements target the authors of specifications that constrain the syntax or structure of URIs or parts of them. Two classes of such specifications are called out specifically:

  • Protocol Extensions ("Extensions") - specifications that offer new capabilities that could apply to any identifier or to a large subset of possible identifiers, e.g., a new signature mechanism for "http" URIs, metadata for any URI, or a new format. ...

2. Best Current Practices for Standardizing Structured URIs This section updates [RFC3986] by advising Specifications how they should define structure and semantics within URIs. Best practices differ, depending on the URI component in question, as described below.

2.1. URI Schemes Applications and Extensions can require the use of one or more specific URI schemes; for example, it is perfectly acceptable to require that an Application support "http" and "https" URIs. However, Applications ought not preclude the use of other URI schemes in the future, unless they are clearly only usable with the nominated schemes.

A Specification that defines substructure for URI schemes overall (e.g., a prefix or suffix for URI scheme names) MUST do so by modifying [BCP35] (an exceptional circumstance).

"BCP35" is also known as RFC 7595, Guidelines and Registration Procedures for URI Schemes, https://datatracker.ietf.org/doc/html/rfc7595.

[RFC 8820 continued] 2.2. URI Authorities Scheme definitions define the presence, format, and semantics of an authority component in URIs; all other Specifications MUST NOT constrain or define the structure or the semantics for URI authorities, unless they update the scheme registration itself or the structures it relies upon (e.g., DNS name syntax, as defined in Section 3.5 of [RFC1034]).

For example, an Extension or Application cannot say that the "foo" prefix in "https://foo_app.example.com" is meaningful or triggers special handling in URIs, unless they update either the "http" URI scheme or the DNS hostname syntax.

Applications can nominate or constrain the port they use, when applicable. For example, BarApp could run over port nnnn (provided that it is properly registered). ...

As a reminder, Section 3.2 of RFC 3986, https://www.rfc-editor.org/rfc/rfc3986#section-3.2, defines the "URI Authority", referred to there in Section 2.2 of RFC 8820:

[RFC 3986] 3.2. Authority ... The authority component is preceded by a double slash ("//") and is terminated by the next slash ("/"), question mark ("?"), or number sign ("#") character, or by the end of the URI.

authority   = [ userinfo "@" ] host [ ":" port ]

URI producers and normalizers should omit the ":" delimiter that separates host from port if the port component is empty. Some schemes do not allow the userinfo and/or port subcomponents.

If a URI contains an authority component, then the path component must either be empty or begin with a slash ("/") character. Non-validating parsers (those that merely separate a URI reference into its major components) will often ignore the subcomponent structure of authority, treating it as an opaque string from the double-slash to the first terminating delimiter, until such time as the URI is dereferenced.

We can see, then, that defining "a prefix or suffix for URI scheme names", which means modifying BCP35/RFC 7595, will be much less desirable than only modifying RFC 3986 itself. Something to keep in mind.


[RFC 3986] 3.1. Scheme Each URI begins with a scheme name that refers to a specification for assigning identifiers within that scheme. As such, the URI syntax is a federated and extensible naming system wherein each scheme's specification may further restrict the syntax and semantics of identifiers using that scheme.

Scheme names consist of a sequence of characters beginning with a letter and followed by any combination of letters, digits, plus ("+"), period ("."), or hyphen ("-"). ...

So, while "http+unix:" is a valid RFC 3986 "scheme", the desired protocol there is simply "http", and there is no reason to define an entire new group of RFC 7595 "schemes" which have nothing to do with the protocol itself. That would entail not just "http+unix:", but also "https+unix:", "ftp+unix:", "smtp+unix:", "submissions+unix:", etc., etc., and on and on.

@agowa338 ":/" would be a valid filename on jfs...

"/.../ht.socket:/path/to/resource.html" would be a valid filename in many file systems. But note that RFC 3986 prohibits use of the ":", "as the first segment of a relative-path reference":

4.1. URI Reference ... A URI-reference is either a URI or a relative reference. If the URI-reference's prefix does not match the syntax of a scheme followed by its colon separator, then the URI-reference is a relative reference. ... 4.2. Relative Reference ... A relative reference that begins with two slash characters is termed a network-path reference; such references are rarely used. A relative reference that begins with a single slash character is termed an absolute-path reference. A relative reference that does not begin with a slash character is termed a relative-path reference.

A path segment that contains a colon character (e.g., "this:that") cannot be used as the first segment of a relative-path reference, as it would be mistaken for a scheme name. Such a segment must be preceded by a dot-segment (e.g., "./this:that") to make a relative-path reference.

Generally, though, we are all talking about modifying RFC 3986, Section 3.2, "Authority" - preferably in the least intrusive or disruptive manner. We may note that RFC 3986, Section 3.2.2, "Host", already requires that "The host subcomponent of authority is identified by an IP literal encapsulated within square brackets, ...". As a courtesy to existing URI parsers, we should avoid a completely different use of square brackets, specifically, to surround a unix domain socket path, where an "IP-literal" should normally be expected, as described in Section 3.2.2, "Host".

In particular, we are talking about modifying RFC 3986, Section 3.2.3, "Port". Again, remember that the "port" element of the "authority" component of the URI already makes use of the ":" delimiter: authority = [ userinfo "@" ] host [ ":" port ] And remember, those square brackets there are not literal. They are just indicating optional elements of the "authority".

We note that the ":" is already also used as a delimiter in the userinfo element of the authority: userinfo = *( unreserved / pct-encoded / sub-delims / ":" )

Section 3.2.1, "User Information" goes on to explain why nothing else follows the ":" , preceding the "@", in the userinfo element.

And we note the general description of the "authority" component from RFC 3986, Section 3.2:

The authority component is preceded by a double slash ("//") and is terminated by the next slash ("/"), question mark ("?"), or number sign ("#") character, or by the end of the URI.

So rather than introducing some new type of delimiter to the "authority" component of the URI, instead, Section 3.2 is modified to say "The authority component ... is terminated by the next colon (":"), slash ("/"), question mark ("?"), or number sign ("#") character, or by the end of the URI.", and Section 3.2.3 is modified to allow for a unix domain socket path, which is simply a path terminated by a ":", as:

port        = *DIGIT / socketpath
socketpath  =  path ":"

Here, it must be understood that "the next colon" is the next colon excluding the optional colons either at the end of the userinfo, or the beginning of the port, "subcomponents". Similarly, "the next slash" is also the next slash after the "authority" component, excluding any slash in "socketpath". Also, "terminated ... by the end of the URI" would include a "unix domain" URI having a "socketpath" with no trailing ":".

"path" itself is defined in the subsequent Section 3.3, "Path". And, remember the context here. This is just the optional "port" element of the "authority", which must be preceded by a ":". So we may say that "The port subcomponent of authority is identified by a path encapsulated within colons." This is simply :path:. URI parsers already are required to know how to parse a URI "path". Here, the parser just has to learn how to distinguish a path encapsulated within colons.

It is true that this approach would prohibit a socket path having a directory or file name ending with ":", but, as we have seen in Section 4.2, the path segment already has a limitation with respect to use of the ":".

The only other issue to be addressed is with respect to the special "host" component of the "authority". Guidance is provided in:

[RFC 3986] 1.1. Overview of URIs ... This specification does not place any limits on the nature of a resource, the reasons why an application might seek to refer to a resource, or the kinds of systems that might use URIs for the sake of identifying resources. This specification does not require that a URI persists in identifying the same resource over time, though that is a common goal of all URI schemes. Nevertheless, nothing in this specification prevents an application from limiting itself to particular types of resources, or to a subset of URIs that maintains characteristics desired by that application.

and also, in:

[RFC 3986] 3.2.2. Host The host subcomponent of authority is identified by an IP literal encapsulated within square brackets, an IPv4 address in dotted-decimal form, or a registered name. ...

 host        = IP-literal / IPv4address / reg-name

Since we are specifically addressing to the http, and since web servers - and mail servers too, for that matter - already know how to serve from unix domain sockets, the practical issue here only has to do with web browsers properly handling a URI in the unix domain. Web browsers are specific applications which, consistent with RFC 3986, are free to address a specific type of URI. In this case, we would like that to include any URI in the "unix domain". "Resolving" the "unix domain", then, since it is a local socket family, and not a network socket family, must be the responsibility of the browser application itself, and not something dependent upon the resolver libraries or the DNS protocol. That's the whole point of this exercise - "local" IPC, not "network". Thus, the web browser itself must recognize a "URI authority" having the "host registered name" unix, and then simply open a local unix socket, and not just annoyingly complain about failing to find a network domain named "unix" - ERR_NAME_NOT_RESOLVED.

Thus, the ABNF rule for "host" is also modified to allow for the "unix" domain, keeping in mind the "first-match-wins" algorithm: host = IP-literal / IPv4address / "unix" / reg-name

Otherwise, it is best not to "reinvent the wheel", and necessitate complicated - pointlessly complicated - parsing schemes.

agowa commented 2 years ago

First of all I like your approach.

But maybe to explain the reasoning behind this thing:

So, while "http+unix:" is a valid RFC 3986 "scheme", the desired protocol there is simply "http", and there is no reason to define an entire new group of RFC 7595 "schemes" which have nothing to do with the protocol itself. That would entail not just "http+unix:", but also "https+unix:", "ftp+unix:", "smtp+unix:", "submissions+unix:", etc., etc., and on and on.

there could as well be a good usecase for adding lower layer protocol information ot hte RFC 3986 schema. The "+" could be defined as a delimiter for lower level protocols (But that wouldn't free us from having to change the authority parsing to allow for a more generic representation). So standardizing the "+" delimiter this way would also allow us to do things like "https+ipv6://www.google.com". Or now with the introduction of quic potentially also "https+tcp://www.google.com".

Now why am I taking time to explain this. Well, your approach is mostly good. But this part host = IP-literal / IPv4address / "unix" / reg-name won't work well in practice. Well it would be better than now. But it still doesn't take into account features like SNI... Therefore instead of your suggestion of having "unix" in the host part and the socket path in the port field, I'd suggest having the DNS/SNI name in the host part. The Socket in the port field and the information about what lower layer protocol should be used added to the schema part. That way we could also standardize that the schema part influences the parsing of the port field. (Thinking of this it would also allow other stuff that would come in handy for development and testing like: https+ipv4://www.example.com:, basically getting rid of host file overwrites for mocking within CI pipelines. But if this is desirable is another topic of its own)

thx1111 commented 2 years ago

@agowa338 So standardizing the "+" delimiter this way would also allow us to do things like "https+ipv6://www.google.com".

The "+" delimiter is already a standard, in RFC 3986 Section 3.1, "Scheme". And, there is nothing preventing anyone from defining new schemes using the "+" delimiter. It is just that RFC 8820, Section 2.1, "URI Schemes", seems to strongly discourage the introduction of new "schemes". Again, as above:

[RFC 8220] A Specification that defines substructure for URI schemes overall (e.g., a prefix or suffix for URI scheme names) MUST do so by modifying [BCP35] (an exceptional circumstance).

And, I believe the comment there, "an exceptional circumstance", is well founded.

So, first an RFC which updates BCP35/RFC 7595 to define and include the new scheme would have to be accepted, and then, an RFC which updates RFC 3986, Section 3.2, "Authority", to define how the "authority" component of the URI would be parsed to make sense of this new scheme, must be accepted.

But, there is no "compelling interest" justifying the definition of a new URI scheme in RFC 7595 that would pass the "Strict Scrutiny" test. There are other ways to use only the existing URI schemes to convey the desired information, using the RFC 3986 "authority" component.

RFC 3986, Section 3.1, "Scheme", concludes with:

Individual schemes are not specified by this document. The process for registration of new URI schemes is defined separately by [BCP35]. The scheme registry maintains the mapping between scheme names and their specifications. Advice for designers of new URI schemes can be found in [RFC2718]. URI scheme specifications must define their own syntax so that all strings matching their scheme-specific syntax will also match the grammar, as described in Section 4.3.

When presented with a URI that violates one or more scheme-specific restrictions, the scheme-specific resolution process should flag the reference as an error rather than ignore the unused parts; doing so reduces the number of equivalent URIs and helps detect abuses of the generic syntax, which might indicate that the URI has been constructed to mislead the user (Section 7.6).

Of course, anyone is welcome to try introducing new "schemes" to the scheme registry, but really, you probably do not want to go there.

Essentially, RFC 3986 defines a set of Rules for URIs that enable different people to write compatible parsers for any standard URI "scheme". You may want to read through RFC 3986 Section 2.2, "Reserved Characters", to see what sorts of options are available for parsing various components of a URI. In part, this says:

[RFC 3986]

 reserved    = gen-delims / sub-delims

 gen-delims  = ":" / "/" / "?" / "#" / "[" / "]" / "@"

 sub-delims  = "!" / "$" / "&" / "'" / "(" / ")"
             / "*" / "+" / "," / ";" / "="

The purpose of reserved characters is to provide a set of delimiting characters that are distinguishable from other data within a URI.

It might be argued that using one of the "sub-delims" to terminate a unix domain socket path might make it easier to distinguish the "authority" component from the resource "path" component, or to designate an entirely new kind of "subcomponent" within the "authority" component, to specify a unix domain "socketpath", as distinct from a "port". But that seemed to me unnecessarily intrusive, as opposed to simply making use of the "gen-delims". In contrast, historic use of the ":" as a delimiter in the URI and use again of the ":" in the expression of an IPv6 address has led to the awkward necessity for enclosing an IPv6 address in square brackets when used in a URI. Alternatively, if any of the RFC 3986 "sub-delims" were to be defined in place of the URI ":" delimiter, then the use of square brackets around an IPv6 address would not be necessary, and IPv6 addresses in URIs would be just a little quicker to type.

Strictly speaking, as used in RFC 3986, a "delimiter" exists in a kind of "delimiter hierarchy", in which the delimiter must be a specific "allowed character". There are delimiters for the URI itself:

[RFC 3986]

  1. Syntax Components The generic URI syntax consists of a hierarchical sequence of components referred to as the scheme, authority, path, query, and fragment.

    URI = scheme ":" hier-part [ "?" query ] [ "#" fragment ]

    hier-part = "//" authority path-abempty / path-absolute / path-rootless / path-empty

and another set of delimiters for each of those five "components" of the URI. This is rather awkwardly, and confusingly, described subsequently in that Section 2.2:

A component's ABNF syntax rule will not use the reserved or gen-delims rule names directly; instead, each syntax rule lists the characters allowed within that component (i.e., not delimiting it), and any of those characters that are also in the reserved set are "reserved" for use as subcomponent delimiters within the component.

Here, by my reading, the "reserved or gen-delims rule names" and the "reserved set" are simply references to the set of "gen-delims" defined earlier. And then, the authors make a distinction - badly - between the "allowed delimiting characters" in the rule being expressed and the "allowed component characters" of each non-delimiter "component" or "subcomponent" of that rule.

Presumably, this "backdoor" way of describing an ABNF syntax rule is for the benefit of anyone writing a parser for the URI generally, and for each component and subcomponent specifically.

And, it's useful to have a general grasp of RFC 5234, Augmented BNF for Syntax Specifications: ABNF, https://www.rfc-editor.org/rfc/rfc5234.html, particularly Section 3, "Operators".

You may also want to look at RFC 6874, Representing IPv6 Zone Identifiers in Address Literals and Uniform Resource Identifiers, https://www.rfc-editor.org/rfc/rfc6874, for an example of an RFC which modifies and updates RFC 3986.

@agowa338 But it still doesn't take into account features like SNI...

Server Name Indication is already standardized, in RFC 6066, Transport Layer Security (TLS) Extensions: Extension Definitions, https://datatracker.ietf.org/doc/html/rfc6066#section-3.

A better example of further extensions to RFC 3986 could be in web browser support for other, non IP, address families and their socket protocols. For instance, by defining, say, a "bluetooth" domain, prefixed with some "friendly Bluetooth name", as a "host" extension, or by extending the "host" address parser to distinguish a 48 bit colon separated bluetooth address from 128 bit colon separated IPv6 addresses, and then extending the "port" option to specify any of the many bluetooth protocols, it would be possible for a web browser to also directly access a remote bluetooth server through the local bluetooth socket. But then, bluetooth already has "Bluetooth Network Encapsulation Protocol", BNEP, allowing IP to be used instead. Again, just as someone would only choose to access http from a unix socket specifically to avoid using IP, usually for security reasons, someone might choose to access http directly from a bluetooth socket with the same purpose, and possibly for the same reason. But I don't expect that there is anyone really motivated to do that. Still, there's always the possibility.

thx1111 commented 2 years ago

By the way, for a tool to monitor the data exchanged at a unix socket, on Linux, there is sockdump.py, from https://github.com/mechpen/sockdump, using an embedded "extended Berkeley Packet Filter" program.

And, to view the http server output from a unix socket on any browser, this single-line shell script, adapted from https://gist.github.com/Boldewyn/4311962, may be useful: sudo curl --unix-socket /run/caddy/serve.sock -H "host:" http://a | your-favorite-browser-here $(base64 -w0 | cat <(echo -n 'data:text/html;charset=UTF-8;base64,') -)

The server domain there, "a", is an arbitrary random character. It is ignored by the server, but curl will fail without it. Additional header options in curl may or may not be needed by the server.

Any local hyperlinks in the page, back to the unix socket, will not work, of course, but otherwise, the page will be rendered properly.

hathiphant commented 1 year ago

Isn't it simpler to consider a UNIX socket as a connection detail, then integrate it simply in configuration of connection proxy. This would let HTTP URI totally unchanged.

It would probably need a simple note to detail rules for HTTP over UNIX socket and specify some modifications to Proxy auto-config, adding a new return type "UNIX" with the path to UNIX socket as host value.

A modification like that would be largely less intrusive that changing URL, but would provide the functionality in specification then could improve compatibility.

My two cents,

lcampbel commented 1 year ago

What about


An HTTP client library would strip .uds.localhost from the host portion and pass the remainder in the host header (and SNI, if using TLS). I think most URL parsers would be happy with this. localhost is a reserved toplevel domain (RFC 2606) so this won't ever conflict with a real hostname. It doesn't require introducing new schemes, or any new syntax (such as hijacking the port number field). And using localhost is kind of a nice hint that this actually refers to something on the local host.

agowa commented 1 year ago

I still think we need to extend the PEG with a way to specify the lower-layer protocols (I.E., chain multiple schemas together). Especially since HTTP can now also be via UDP and more and more stuff uses HTTP as a transport/tunneling protocol...

Edit: moved proposal to update parsing into separate ticket

@lcampbel, your examples would have compatibility issues in the real world, as some servers have (not quite RFC compliant) usage of double slashes in the URL. I already had the unpleasant opportunity to debug such an issue in an API. Requests just failed without the additional slash. Also, some people use ".localhost" for their localhost development environment. I've seen that with some k8s developers with a clone of the environment running locally and ".localhost" they used for the parts of the web app that would normally have been public (in the prod deployment). Everything below it represented the different subdomains of it (mainly because *.localhost. resolves to and ::1 on almost all systems, regardless of how many subdomains one provides, and without the need for editing the hosts file or deploying a locally running additional DNS resolver with a special zone file)...

lcampbel commented 1 year ago

I've never seen subdomains of localhost resolving to anything. It certainly doesn't happen on vanilla macOS or Ubuntu. Sure, you could put an entry foo.localhost in your /etc/hosts, but if you're doing that you could just avoid putting uds.localhost. The problem I have with adding new syntax is that it requires all URL parsers to be updated, which is impractical.

agowa commented 1 year ago

@lcampbel, just try it. This is using systemd-resolved (what RedHat and Fedora use, for example): image

No host file entry for it, it "just works" as long as the tld is "localhost"...

(And btw, I moved the rest of my comment into a new ticket, 778)

lcampbel commented 1 year ago

Well, that's what Redhat does, I get it, but on Ubuntu (focal) and macOS (Monterey) I get "Name or service not known" or "Unknown host" respectively. But it really doesn't matter; an HTTP client that supported this proposal would not even bother trying to resolve the name if it ends in '.uds.localhost'; it would just be connecting directly to a local Unix domain socket.

agowa commented 1 year ago

Introducing the ".local" issue all over again. But even if we exclude that part, it still has the issue with the //, not to mention that you completely forgot about Windows systems and their paths (yes, windows has AF_UNIX too).

And your syntax wouldn't work for any of these:

And even using the UNIX style syntax Windows supports of /foo/bar.socket (that Windows interprets as CurrentDrive:\foo\bar.socket) won't reliably work, as it will not necessarily use the same "CurrentDrive" for all applications on the same computer, as it entirely depends on what drive your current working directory set to. If it's D:\something then it evaluates to D:\foo\bar.socket but if it is C:\something then you get C:\foo\bar.socket

randomstuff commented 10 months ago


What you absolutely don't want is the ability for any web server in the wild to use your browser to issue arbitrary HTTP requests to arbitrary Unix sockets.

It is already quite difficult for people to grasp the notion that LAN-only services and localhost-ony services can be attacked by remote web servers (CSRF, DNS rebinding attacks to LAN services or localhost-services). If a web browser, were to allow arbitrary websites to issue HTTP request to arbitrary UNIX sockets, this would open up a wide range of attack opportunity (eg. using DNS rebinding attacks to attack UNIX-socket bound Docker servers) including attacks based on protocol-confusion.

If you wanted such a feature to be mostly safe, you would have to actively opt-in:

Firefox currently allows to use a SOCKS proxy over UNIX socket (including multiple suchs proxies when using FoxyProxy). It would be possible to have a Unix-bound SOCKS proxy which would resolve some domain names to Unix socket.

agowa commented 10 months ago

@randomstuff only because it is addressable doesn't mean it is reachable. And after all websites currently can already contain "file:///" urls or similar.

karwa commented 10 months ago

You don't really want to put the UDS path in the URL's path, because somebody could write:

<a href="/help">...</a>

And that would overwrite the path to the UDS, meaning a broken link.

Instead, you really want this to be part of the hostname. Hostnames are intrinsically abstract already, so there is no fundamental reason they can't resolve to a local socket. In other words, @randomstuff 's project is doing the conceptually correct thing by providing a mapping from hostnames to sockets.

And perhaps most importantly, it shows that this need can be met without changing the URL standard.

thx1111 commented 10 months ago

Reading back through this discussion, it has not at all been established that there is a consensus as to "where" the underlying issue should lie, and so, any "solution" offered can appear to simply "miss the point", depending upon your point of view. I find myself back-and-forth about the various approaches suggested, including my own.

I can summarize at least four alternatives proposed here to the issue of, to generalize, "Addressing Unix Domain Sockets".

1) RFC 3986 "Uniform Resource Identifier (URI): Generic Syntax" must be modified to allow addressing unix domain sockets.

2) The URI Shemes in BCP 35/RFC 7595 "Guidelines and Registration Procedures for URI Schemes" must define a new URI Scheme and Owner which specifically supports unix domain socket addressing. Review here: https://www.iana.org/assignments/uri-schemes/uri-schemes.xhtml

3) The existing http/https schemes defined in RFC 8615 "Well-Known Uniform Resource Identifiers (URIs)" must be expanded to explicitly support addressing unix domain sockets.

4) Ignore the URI standard RFCs and just write or modify an html display client to support unix domain socket addressing.

Without first saying which approach we are thinking about, the conversation can become kind of silly, since any solution which "works", works. Otherwise, it may be that I both enjoy, and cringe at, "bike shedding" as much as anyone else.

randomstuff commented 10 months ago

For context about the pitfalls of stuffing/smuggling a Unix socket path in a HTTP URI, the Node.js Requests and got libraries would allow stuffing a Unix domain socket path in a HTTP URI like so: http://unix:/var/run/docker.sock:/containers/json. It turned out this could be exploited by a remote web server to target a local Unix domain socket through a HTTP redirect. In got, this feature is now disabled by default and HTTP redirects to Unix sockets are now disabled.

I would think that the ability to address arbitrary Unix domain sockets in HTTP(S) URIs is fraught with peril. If this were part of the URI standards, client applications and libraries would be expected to implement this feature and this would certainly end up generating a lot of vulnerabilities such as CVE-2022-33987: attacks on arbitrary Unix domain socket application through malicious redirects or more generally through malicious URIs.

What might be useful is:

kevincox commented 10 months ago

While you have a good point it is sort of a shame to block UNIX sockets due to this. The same problems exist for local services, LAN servers (like routers) and even cloud VM metadata servers are open to vulnerabilities due to this. Really every redirect target should be carefully considered, and every DNS lookup should have the resulting IP treated with scrutiny. Unfortunately that isn't the world that we live in, developers are careless and many (most?) popular HTTP libraries don't even expose the primitives to do this. I am not aware of even a single library that prevents this by default. In practice things like Origin headers and CORS are used to ensure that requests are coming from the right place and not tricked redirections. These hacks have worked OK, and particularly vulnerable services like browsers are more strict (such as preventing public sites from accessing your router's web UI in most cases).

However while this vulnerability is not specific to UNIX sockets it is maybe wise to avoid adding more surfaces that can be accessed via this common issue.

kevincox commented 10 months ago

the ability for the user to map domain names to Unix domain sockets in client applications

Isn't this just security through obscurity? Or is the idea that the service hosting the domain socket needs to opt-in. Presumably because it has some sort of heuristics to block misdirected requests.

randomstuff commented 10 months ago

Or is the idea that the service hosting the domain socket needs to opt-in.


One motivation of OP was access control:

Access control. Even if the service is diligent only to bind to localhost, TCP still allows any (non-sandboxed) process or user on the machine to connect. Any access control has to be implemented by the service itself, which often involves implementing (hopefully with sufficient security) its own password authentication mechanism.

However, in order to increase the security of some local application (reduction of the attack surface, rely on implicit authentication through UID and filesystem access control), this might end-up:

Some opt-in mechanism could mitigate these issues to some extent.

kevincox commented 10 months ago

While this may increase the attack surface of some services it will also decrease the attack surface of others as the original message explains. So it is important to weight the benefits as well as consider possible mitigations that can make the tradeoffs more favourable.

thx1111 commented 10 months ago

Given the ambiguity in addressing unix domain sockets, I am still inclined to fault the basic RFC 3986. So, here is a brief review, several rants, and another suggestion for unix domain socket addressing, simply using the square bracket "hack".

Assuming the general concept of "Uniform Resource Identifier" from Section 1.1.3., the basic structure is defined in Section 3 as having 5 components: scheme, authority, path, query, and fragment. First off, then, what type of URI component is a unix domain socket (UDS) address?

The original context here is "HTTP servers", and "http" is, itself, a type of "scheme". So, UDS as "scheme" is not my first choice.

Now, RFC 3986 uses the term "resource" without much constraint, saying 'This specification does not limit the scope of what might be a resource; rather, the term "resource" is used in a general sense for whatever might be identified by a URI.' Effectively, a "resource" is whatever the user wants it to be. Is a UDS a "resource" itself? For the purpose here, "no". The "resource" implied by an HTTP server is some other specific data delivered using HTTP.

Then, is a UDS a type of "path", "query", or "fragment"?

From Section 3.3, "The path component contains data, usually organized in hierarchical form, that, along with data in the non-hierarchical query component (Section 3.4), serves to identify a resource within the scope of the URI's scheme and naming authority (if any)." Since the UDS is not the "resource", and, since the "path" identifies a "resource", then the UDS cannot be a "path".

Similarly, from Sections 3.4. Query and 3.5 Fragment, both of these components are also references to the "resource". So the UDS is also not either a "query" or a "fragment".

And that leads to the inference that the UDS must be a kind of "authority". RFC 3986 actually subdivides the "authority" component itself into three parts, in Section 3.2.:

 authority   = [ userinfo "@" ] host [ ":" port ]

And here, the same analysis can be applied. Is the UDS a type of "userinfo"? Section 3.2.1. says, "The userinfo subcomponent may consist of a user name and, optionally, scheme-specific information about how to gain authorization to access the resource." Hmm - "scheme-specific information about how to gain authorization to access the resource" - "how to gain authorization". Does the UDS tell "how to gain authorization"? Sort of - maybe - not really - I'd say "no".

Is the UDS a type of "host"? From Section 3.2.2., "The host subcomponent of authority is identified by an IP literal encapsulated within square brackets, an IPv4 address in dotted- decimal form, or a registered name." Is, then, the UDS a type of "IP literal", "IPv4 address", or a "registered name"? Hmm - what is an "IP literal"? Again, from Section 3.2.2.:

 IP-literal = "[" ( IPv6address / IPvFuture  ) "]"

Since a UDS is not any of an "IPv6address / IPvFuture", an "Pv4 address", or a "registered name", then "no", a UDS is also not any type of "host".

And then, using RFC 3986, there is only one interpretation remaining. Is the UDS a type of "port"? From Section 3.2.3. Port:

 The port subcomponent of authority is designated by an optional port number in decimal following the
 host and delimited from it by a single colon (":") character.

  port        = *DIGIT

Well, clearly, and as has been mentioned previously in this discussion, the UDS is not a "DIGIT". And here is where I find fault with RFC 3986, in its limited scope when defining "port". Except that, Section 3.2.3. goes on to say, "The type of port designated by the port number (e.g., TCP, UDP, SCTP) is defined by the URI scheme." And that statement suggests asking "What sort of Communication Protocol is UDS?" Of course a UDS is not itself a kind of communication protocol, but the relationship should become apparent. It may be more illuminating to ask the converse, "What sort of Sockets are TCP, UDP, and SCTP?" And then, the Unix - in this case Linux - man pages offer some guidance.

 man 7 tcp:     tcp_socket = socket(AF_INET, SOCK_STREAM, 0);
 man 7 udp:     udp_socket = socket(AF_INET, SOCK_DGRAM, 0);
 man 7 sctp:    sctp_socket = socket(PF_INET, SOCK_STREAM, IPPROTO_SCTP);
                sctp_socket = socket(PF_INET, SOCK_SEQPACKET, IPPROTO_SCTP);

And generally, "What is a 'socket'"? In part:

 man 2 socket:
        Name            Purpose                         Man page
        AF_UNIX         Local communication             unix(7)
        AF_LOCAL        Synonym for AF_UNIX
        AF_INET         IPv4 Internet protocols         ip(7)

        The  manifest  constants  used under 4.x BSD for protocol families are PF_UNIX, PF_INET, and so
        on, while AF_UNIX, AF_INET, and so on are used for address families.  However, already the BSD
        man page promises: "The protocol family generally is the same as the address family", and
        subsequent standards use  AF_*  everywhere.

and then:

 man 7 unix:    unix_socket = socket(AF_UNIX, type, 0);

Here is my first rant about RFC 3986. The "port" component of the defined URI has presumed an Address Family, here implying AF_INET exclusively, along with what is a merely incidental association with a port "number". There is no explanation or justification given for this presumption.

Alternatively, it might be supposed that this presumption of an Address Family is an erroneous interpretation by the reader of RFC 3986. It may instead be supposed that the "port" component of the URI is simply a general concept to be associated with any Address Family which might be included from the list given from man(2)socket.

And so, I believe that this is the interpretation, while not "official", yet, that must be taken with RFC 3986.

Then, "What is the 'port' subcomponent of authority of an Address Family AF_UNIX socket?"

Here, man(7)unix tells us, "Traditionally, UNIX domain sockets can be either unnamed, or bound to a filesystem pathname (marked as being of type socket)." In our case, we are looking for a URI, so "unnamed" is not useful. Instead, the man page offers "a filesystem pathname". That seems clear enough.

Therefore, an RFC 3986 URI "port" for an AF_UNIX socket might also be interpreted as simply "a filesystem pathname", instead of exclusively as a number.

Allowing that, then the remaining problem only involves appropriate delimiters, to allow correctly parsing the resulting URI for the AF_UNIX "port".

Referring again to Section 2.2.:

      reserved    = gen-delims / sub-delims

      gen-delims  = ":" / "/" / "?" / "#" / "[" / "]" / "@"

      sub-delims  = "!" / "$" / "&" / "'" / "(" / ")"
                  / "*" / "+" / "," / ";" / "="

Incidentally, it may be noted that this RFC 3986 list of delimiters is missing the percent "%", from Section 2.1 Percent-Encoding, and the set of White Space characters generally. The reader is now well into the realm of "inferring", "guessing", and "interpreting", instead of specifically "defining".

Here is my second rant about RFC 3986, related to the use of delimiters. The Section 3. URI syntax explicitly defines the ":" as separating the "scheme" from the "authority". Subsequently, in Section 3.2., it says 'The authority component is preceded by a double slash ("//") and is terminated by the next slash ("/"), question mark ("?"), or number sign ("#") character, or by the end of the URI.' Taken together, this double slash actually provides no information whatsoever in the URI and only serves to "poison" the parsing of the URI, by requiring the parser to distinguish potentially between ":///...", "://...", and ":/...". For instance, the "file" scheme, RFC 8089, supports optionally leaving out this useless "//" altogether. RFC 3986 offers no explanation or justification for this use the double slash "//". The delimiter might as well have been defined explicitly as "://". This makes any use of the slash "/" as a delimiter in the URI potentially problematic, where it is also used as an essential component of any unix "filesystem pathname", when referring to the proposed UDS AF_UNIX "port", as well as, already, referring to an actual "resource" by pathname.

A third rant regards Section 3.2.2 Host, which says:

 A host identified by an Internet Protocol literal address, version 6 [RFC3513] or later, is
 distinguished by enclosing the IP literal within square brackets ("[" and "]").  This is the only place
 where square bracket characters are allowed in the URI syntax.

The only reason that these square brackets are needed is because of the repeated and overloaded use of the colon ":" as a delimiter in the "authority", in Section 3.2 preceding the "port", and in Section 3.2.1, potentially subdividing the "userinfo". Considering that RFC 3513 defines the use of colon ":" as the field delimiter in an IPv6 address, this should have glaringly suggested that the same ":" would be a bad choice for a delimiter in the RFC 3986 "authority" component and subcomponents of the URI. And there are plenty of alternative characters to choose, from the small ASCII character set, for use as delimiters in the "authority".

The use of the square brackets, then, is a "hack", consequent of a bad choice for delimiter in the "authortiy" component of the URI. Be that as it may, suppose that the prohibition "This is the only place where square bracket characters are allowed in the URI syntax", is ignored. Then, this same "hack" can be applied equally to the unfortunate choice of the slash "/" as a delimiter within the URI syntax with respect to the "port" subcomponent of the "authority", as with the "host" subcomponent.

I propose now another alternative to addressing unix domain sockets. By example, using the square bracket "hack", the result would allow, for instance, all of:


All of these examples otherwise strictly follow the RFC 3986 URI syntax.

That is the least intrusive "hack" to UDS addressing and merely extends an existing URI "hack". A "cleaner" revision to RFC 3986 would be to eliminate the use of either the colon ":" or the slash "/" as delimiters in the URI syntax delineating its components and subcomponents, except for the initial ":" separating the "scheme" and "authority". There are 11 other "sub-delims" defined in RFC 3986 that seem perfectly usable as delimiters in the URI "authority", which would obviate the need for using these square bracket "hacks" completely.

With reference to previous remarks about security issues, it may be noted that man(7)unix describes AF_UNIX as supporting communication "between processes on the same machine", so there would be no "remote access" possible, despite the http/https "scheme", if that constraint were followed. And, since the UDS "port" is just a Unix "filesystem pathname", there are many existing security measures available.

On the other hand, this suggested UDS AF_UNIX "port" addressing clearly does lend itself to replacing "localhost" with "some-remote-host", to access some UDS on, literally, a remote host. But then, any http/https "server" will be providing its own security measures, should it allow UDS addressing at all, so that's a different issue and not really a problem here. This does introduce another concept, access to a UDS by a local http/https server, as opposed to UDS access only by a local html display client.

There is still the question of whether the http/https schemes would need to be formally updated to acknowledge any kind of UDS AF_UNIX "port" addressing. Reading at RFC 9110, Sections 4.2.1. http URI Scheme and 4.2.2. https URI Scheme:

        The origin server for an "http[/https]" URI is identified by the authority component, which
        includes a host identifier ([URI], Section 3.2.2) and optional port number ([URI], Section

By my reading, "no". The http/https schemes simply refer to the RFC 3986 URI "optional port number" definition, and would therefore follow any update to RFC 3986 itself.

The much more difficult issue remains with any html display client, which must be taught to recognize any kind of UDS AF_UNIX "port" addressing. Again, strictly, that is a separate issue. But this does point-out that the proposal here implies that there are two distinct "solution" arenas to confront: first, RFC 3986 itself, and second, the various de facto standard html display clients extent.

The Node.js security issue mentioned by @randomstuff is - well - a Node.js security issue, as was mentioned. It's not a server security issue and has nothing to do with UDS AF_UNIX "port" addressing per se. Of course, that also doesn't mean that html display client security issues go away. It's just a separate problem - though, it's still a problem. It is interesting that this raises the question of security in the "reverse" direction, from a remote "server" potentially accessing a local "client resource", through a UDS.

That is not something inherent in the original concept of http client/server communication, but a consequence of allowing the "client" to potentially act, itself, as a kind of "server", using some client facility, as with javascript, to access a local resource. The security model, then, requires simply that the client be smart enough not to do "anything stupid" at the behest of the server. Ha!

mnot commented 10 months ago

Lots of different proposals have been made above:

Changing the URL syntax requires coming up with a solution for all URLs, not just HTTP. Backwards compatibility needs to be considered for a very large ecosystem, and incremental deployment needs to be considered. As Anne said above, these factors raise the bar considerably for any proposal, and so should be a last resort (there's currently an effort by IPv6 people to do a similar thing, and it's not going well for these reasons).

Creating a new TLD for one protocol isn't good architecture, and a lot of people are going to push back on it. Again, a proposal in this area is likely to hit friction from other, unrelated communities (in this case, DNS).

Appending a suffix to the URL scheme implies that the suffix makes sense for other URL schemes. This means that wider review and discussion will need to take place to get it adopted.

That makes defining a new URL scheme the approach that's most likely to succeed. Such a scheme could define itself to use an authority that is not grounded in DNS, so it could be something like:


Defining it as a new scheme would also provide an opportunity to answer a lot of questions like "is HTTP/1 or HTTP/2 used"? "does it use TLS"? and so on.

But that's just my opinion.

If there's interest in solving this problem, I'd suggest that someone write a document outlining a proposal and bring it to the IETF HTTP WG - there are are larger diversity of HTTP implementers represented there that can provide feedback.