Open stefanbeigel opened 9 months ago
Can you provide a repro that does not involve downloading code from the internet, specifying what is the expected behavior and what you see instead?
Hi @aduh95 created a minimal http server example:
import http from 'http';
http.createServer((request, response) => {
let targetUrl = new URL(`https://abc.com${request.url}`);
console.log(`incoming url: ${request.url}`);
console.log(`target url: ${targetUrl}`);
response.writeHead(200, {'Content-Type': 'text/plain'});
response.end('Hello World\n');
}).listen(3000);
When calling the serer with:
curl --path-as-is "localhost:3000/abc/../foobar"
The output is
incoming url: /abc/../foobar
target url: https://abc.com/foobar
So the incoming url does not follow the whatwg standard as dot-segments are not removed.
When using the Nodejs whatwg URL class this dot-segments are removed.
This leads to the problem that most http frameworks like express and fastify don't match the right route, as they use message.url
As a solution I could image to add an option to http.createServer that the message.url is parsed with the URL class. For example:
if(parseIncomingUrlwithWhatWg) {
const newUrl = (new URL(`htp://localhost${request.url}`));
request.url = newUrl.pathname + newUrl.search;
}
Please also check out the WhatWg issue discussion
cc @anonrig
I think that changing the behavior would break the ecosystem so the only possibility would be in passing an option to http.createServer
. PR welcomed
We discussed this a bit today on the web server frameworks team. Here is a recap:
req.url
would make it really difficult to landhttp.createServer
would mean that libraries in your framework stack would need to know which mode they were operating in which might be difficult to achieve and is less than idea.req.newURL
) would make it easier to land but less useful to the ecosystemSo for me, I would say that maybe the current options on the table for this are worse than the unexpected behavior and that we should work hard toward the new http api's being proposed instead.
Thanks @wesleytodd for the recap and linking of the web-server-frameworks issue, good to know that it was even discussed there before. As you might have seen I also tried to add it to the http-server module, where I received different options. The original problem would only be solved if everyone (all web frameworks) would use the parsed url for routing and not the raw message header (path + query). So I guess the biggest concern is the performance drawback when using whatWG for parsing. Maybe it would be possible to parse the url (only path and query) more efficient. At the moment I have the feeling I can't really help there as you need to discuss this further in the web-server-frameworks round.
I also tried to add it to the http-server module
I did see that and read through the comments, I agree they gave different input but also I agree mostly with what was said over there. Honestly thinking about it maybe this comment should have been on that PR to keep the conversation together, this was just the link shared in the meeting.
The original problem would only be solved if everyone (all web frameworks) would use the parsed url for routing and not the raw message header (path + query).
Yeah this is, IMO, a valid goal. But I think different frameworks would have different targets, for example Fastify highly values performance so might decide not to use the parsed URL. On the other hand, Express might decide to use the URL instance as performance is a secondary concern compared to user expectations around web platform support. I say this because ideally I think Node core should offer both in easily consumed ways.
So I guess the biggest concern is the performance drawback when using whatWG for parsing.
I mean that is some folks biggest concern, it is not mine (as I said above).
At the moment I have the feeling I can't really help there as you need to discuss this further in the web-server-frameworks round.
That is an open group, we would love to have you in those issues and helping us drive the decisison making on this topic. Feel free to hop in there!
Version
18.17.1
Platform
All
Subsystem
No response
What steps will reproduce the bug?
I would like to share with you a very common scenario:
Express and fastify route matching is done on the incoming message url, which is not whatwg compliant url, see also discussion here This scenario can lead to path traversal vulnerabilities as the whatwg URL class will evaluate ../ and ./ but the server has matched on another path.
This situation is not good at all, because the developer need to know about the different parsing logic. I created this issue at Nodejs as the root cause of this the url property of the incoming message. As you can see in the linked property the fastify maintainer says that
Having different url representations in one framework/language is not good at all.
Example application
I have prepared a sample application using fastify: https://github.com/stefanbeigel/whatwg-fastify-path-traversal/blob/main/index.mjs Call the app with
curl --path-as-is localhost:3000/abc/../foobar
How often does it reproduce? Is there a required condition?
-
What is the expected behavior? Why is that the expected behavior?
OR
What do you see instead?
-
Additional information
Fastify: Different URL parsing between fastify and URL whatwg standard WhatWG: URL path shortening for ../ creates problem with other URL parsers that do not follow the whatwg standard Discussion: http(s) server why is request.url actually not an url, but a pathname. And why is parsing it now so hard?