nodejs / node

Node.js JavaScript runtime ✨🐢🚀✨
https://nodejs.org
Other
107.25k stars 29.42k forks source link

Http server message.url does not follow the whatwg url standard and does not evaluate ../ #51311

Open stefanbeigel opened 9 months ago

stefanbeigel commented 9 months ago

Version

18.17.1

Platform

All

Subsystem

No response

What steps will reproduce the bug?

I would like to share with you a very common scenario:

  1. A request is recevied via NodeJs Express or Fastify server using http module.
  2. Request is forwarded to another service using an http client that uses the whatwg URL class to build the target URL using the service hostname + the incoming message url

Express and fastify route matching is done on the incoming message url, which is not whatwg compliant url, see also discussion here This scenario can lead to path traversal vulnerabilities as the whatwg URL class will evaluate ../ and ./ but the server has matched on another path.

This situation is not good at all, because the developer need to know about the different parsing logic. I created this issue at Nodejs as the root cause of this the url property of the incoming message. As you can see in the linked property the fastify maintainer says that

Fastify never aimed to be compatible with the WHATWG Url parsing standard.

Having different url representations in one framework/language is not good at all.

Example application

I have prepared a sample application using fastify: https://github.com/stefanbeigel/whatwg-fastify-path-traversal/blob/main/index.mjs Call the app with curl --path-as-is localhost:3000/abc/../foobar

How often does it reproduce? Is there a required condition?

-

What is the expected behavior? Why is that the expected behavior?

  1. Parse the incoming message url with WHATWG url class

OR

  1. Drop the whatwg standard for the URL class, as it mainly used for browsers.

What do you see instead?

-

Additional information

Fastify: Different URL parsing between fastify and URL whatwg standard WhatWG: URL path shortening for ../ creates problem with other URL parsers that do not follow the whatwg standard Discussion: http(s) server why is request.url actually not an url, but a pathname. And why is parsing it now so hard?

aduh95 commented 9 months ago

Can you provide a repro that does not involve downloading code from the internet, specifying what is the expected behavior and what you see instead?

https://github.com/nodejs/node/blob/7981e2ee642ae2b551201aa120560eb7d54fde03/.github/ISSUE_TEMPLATE/1-bug-report.yml#L31-L34

stefanbeigel commented 9 months ago

Hi @aduh95 created a minimal http server example:

import http from 'http';

http.createServer((request, response) => {
    let targetUrl = new URL(`https://abc.com${request.url}`);
    console.log(`incoming url: ${request.url}`);
    console.log(`target url: ${targetUrl}`);

    response.writeHead(200, {'Content-Type': 'text/plain'});
    response.end('Hello World\n');
}).listen(3000);

When calling the serer with:

curl --path-as-is "localhost:3000/abc/../foobar"

The output is

incoming url: /abc/../foobar
target url: https://abc.com/foobar

So the incoming url does not follow the whatwg standard as dot-segments are not removed. When using the Nodejs whatwg URL class this dot-segments are removed. This leads to the problem that most http frameworks like express and fastify don't match the right route, as they use message.url

As a solution I could image to add an option to http.createServer that the message.url is parsed with the URL class. For example:

    if(parseIncomingUrlwithWhatWg) {
      const newUrl = (new URL(`htp://localhost${request.url}`));
      request.url = newUrl.pathname + newUrl.search;
    }

Please also check out the WhatWg issue discussion

marco-ippolito commented 9 months ago

cc @anonrig

marco-ippolito commented 9 months ago

I think that changing the behavior would break the ecosystem so the only possibility would be in passing an option to http.createServer. PR welcomed

wesleytodd commented 7 months ago

We discussed this a bit today on the web server frameworks team. Here is a recap:

  1. The breaking nature of a change to req.url would make it really difficult to land
  2. The idea to make it an option to http.createServer would mean that libraries in your framework stack would need to know which mode they were operating in which might be difficult to achieve and is less than idea.
  3. Alternatives like a new property (bad example: req.newURL) would make it easier to land but less useful to the ecosystem
  4. Ideally we get to a place with the new http apis where we are only exposing a URL instance, that work is being tracked here: https://github.com/nodejs/web-server-frameworks/issues/71

So for me, I would say that maybe the current options on the table for this are worse than the unexpected behavior and that we should work hard toward the new http api's being proposed instead.

stefanbeigel commented 7 months ago

Thanks @wesleytodd for the recap and linking of the web-server-frameworks issue, good to know that it was even discussed there before. As you might have seen I also tried to add it to the http-server module, where I received different options. The original problem would only be solved if everyone (all web frameworks) would use the parsed url for routing and not the raw message header (path + query). So I guess the biggest concern is the performance drawback when using whatWG for parsing. Maybe it would be possible to parse the url (only path and query) more efficient. At the moment I have the feeling I can't really help there as you need to discuss this further in the web-server-frameworks round.

wesleytodd commented 7 months ago

I also tried to add it to the http-server module

I did see that and read through the comments, I agree they gave different input but also I agree mostly with what was said over there. Honestly thinking about it maybe this comment should have been on that PR to keep the conversation together, this was just the link shared in the meeting.

The original problem would only be solved if everyone (all web frameworks) would use the parsed url for routing and not the raw message header (path + query).

Yeah this is, IMO, a valid goal. But I think different frameworks would have different targets, for example Fastify highly values performance so might decide not to use the parsed URL. On the other hand, Express might decide to use the URL instance as performance is a secondary concern compared to user expectations around web platform support. I say this because ideally I think Node core should offer both in easily consumed ways.

So I guess the biggest concern is the performance drawback when using whatWG for parsing.

I mean that is some folks biggest concern, it is not mine (as I said above).

At the moment I have the feeling I can't really help there as you need to discuss this further in the web-server-frameworks round.

That is an open group, we would love to have you in those issues and helping us drive the decisison making on this topic. Feel free to hop in there!