whatwg / url

URL Standard
https://url.spec.whatwg.org/
Other
527 stars 137 forks source link

URL path shortening for ../ creates problem with other URL parsers that do not follow the whatwg standard #810

Closed stefanbeigel closed 8 months ago

stefanbeigel commented 8 months ago

What is the issue with the URL Standard?

Hi,

I would like to share with you a common scenario:

  1. A request is recevied via NodeJs Express or Fastify server
  2. Request is forwarded to another service using an http client that uses the URL class to build the target URL using the service hostname + the incoming request.pathname

This scenario can lead to path traversal vulnerabilities as Express and Fastify do not evaluate ../ and ./ but the whatwg URL does. So the route checks of express / fastify match another path. This situation is not good at all, because the developer need to know about the different parsing / evaluation logic.

Example I have prepared a sample application with fastify. https://github.com/stefanbeigel/whatwg-fastify-path-traversal/blob/main/index.mjs Call the app with curl --path-as-is localhost:3000/abc/../foobar

Possible solutions

  1. Http server libraries parses the URL with the whatwg URL standard
  2. Whatwg URL drops the path shortening or gives an option to disable it

As this behavior was introduced by the URL class I created this issue, even you can argue that this is a problem of fastify / express / nodejs.

annevk commented 8 months ago

Using different parsers is a known security vulnerability and it's one of the reasons we have standardized how URLs are parsed, so more tooling can interoperate with it.

I hope you're not expecting that we change the standard over this as that would undoubtedly break the web.

stefanbeigel commented 8 months ago

Using different parsers is a known security vulnerability and it's one of the reasons we have standardized how URLs are parsed, so more tooling can interoperate with it.

Are there any plans to also adapt server side parsing of the url in nodejs / the http server module to use the whatwg URL class? I know it might be better to ask this the nodejs team but I guess you are also in contact with them.

I hope you're not expecting that we change the standard over this as that would undoubtedly break the web.

Well actually I would question why it's defined in the standard like this, why not leave the path as it is? With this the whatwg url standard automatically creates issues with other url parsers that do not evaluate ../

BR

annevk commented 8 months ago

Best to ask Node.js.

It's defined in the standard like this because this is how browsers have behaved for a long time. They parse and mostly-canonicalize at the same time.

stefanbeigel commented 8 months ago

Hi

It's defined in the standard like this because this is how browsers have behaved for a long time. They parse and mostly-canonicalize at the same time.

Well then why even follow the whatwg standard if it's mainly focused on browsers? Really wonder why this URL class has been made standard in NodeJs, will open a issue at NodeJs and link it here.

Btw I also opened an issue at fastify. https://github.com/fastify/fastify/issues/5204

annevk commented 8 months ago

Because having different parsers causes issues, I thought we already established that...

karwa commented 8 months ago

By the way, the treatment of . and .. components in this standard is consistent with previous RFCs, such as RFC-3986.

When describing URL normalization, RFC-3986 is explicit about what . and .. components are for:

6.2.2.3. Path Segment Normalization

The complete path segments "." and ".." are intended only for use within relative references (Section 4.1) and are removed as part of the reference resolution process (Section 5.2). However, some deployed implementations incorrectly assume that reference resolution is not necessary when the reference is already a URI and thus fail to remove dot-segments when they occur in non-relative paths. URI normalizers should remove dot-segments by applying the remove_dot_segments algorithm to the path, as described in Section 5.2.4.

It's been a while since I read the HTTP spec, but if I recall correctly, I believe the idea is that the server takes the path and query (together known as the "request target"), and the host (supplied via the Host: header), and reconstructs the URL using the processing in RFC-3986. That means it would be expected to remove dot segments, as this standard also does.

Furthermore, I think RFC-3986 is clear that . and .. components are intended only for hierarchical traversal within the path, and any servers which give them a different interpretation could reasonably be described as "incorrect". It would be very difficult for clients to robustly interact with such a server because the RFC gives tools and libraries explicit license and even encouragement to remove them ("URI normalizers should remove dot-segments").

So IMO, that part of the standard, and its embodiment in JavaScript's URL class, is consistent with RFC-3986. Which is nearly 20 years old and widely used even outside of browser contexts.