nodejs / node

Node.js JavaScript runtime ✨🐢🚀✨
https://nodejs.org
Other
106.52k stars 29.03k forks source link

`pathToFileURL` function in url fails to handle special characters properly #54515

Open EarlyRiser42 opened 3 weeks ago

EarlyRiser42 commented 3 weeks ago

Version

22.5.1

Platform

Linux earlyriser-virtual-machine 6.5.0-44-generic #44~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Tue Jun 18 14:36:16 UTC 2 x86_64 x86_64 x86_64 GNU/Linux

Subsystem

No response

What steps will reproduce the bug?

$ node
Welcome to Node.js v22.5.1.
Type ".help" for more information.
> require('url').pathToFileURL('\\server\foobar[', { windows: true }).href;
'file:///C:/server%0Coobar['
> require('url').pathToFileURL('\\server\foobar[', { windows: false }).href;
'file:///Users/admin/%5Cserver%0Coobar['
> require('url').pathToFileURL('C:/path^', { windows: true }).href;
'file:///C:/path^'
> require('url').pathToFileURL('home/path^', { windows: false }).href;
'file:///Users/admin/home/path^'

How often does it reproduce? Is there a required condition?

everytime

What is the expected behavior? Why is that the expected behavior?

I think the expected behavior is:

> require('url').pathToFileURL('\\server\foobar[', { windows: true }).href;
'file:///C:/server%0Coobar%5E'
> require('url').pathToFileURL('\\server\foobar[', { windows: false }).href;
'file:///Users/admin/%5Cserver%0Coobar%5B'
> require('url').pathToFileURL('C:/path^', { windows: true }).href;
'file:///C:/path%5E'
> require('url').pathToFileURL('home/path^', { windows: false }).href;
'file:///Users/admin/home/path%5D'

because [ and ] are special characters used to specify IP addresses in URLs, and ^ is generally unsafe in URLs and should also be percent-encoded.

What do you see instead?

I see that [, ], and ^ are not encoded

Additional information

I think that setter and constructor both didn't encoded those special characters

aduh95 commented 2 weeks ago

I'm not sure any of the examples you show are really a problem, IIUC those characters would be problematic only in the origin; in the path it's perfectly fine.

However, a problematic one would be:

$ node -p 'url.pathToFileURL("\\\\server[\\foobar[", { windows: true }).href'
file:///foobar[
RedYetiDev commented 2 weeks ago

FWIW According to RFC1738, they should be encoded (see the bold):

Characters can be unsafe for a number of reasons. The space character is unsafe because significant spaces may disappear and insignificant spaces may be introduced when URLs are transcribed or typeset or subjected to the treatment of word-processing programs. The characters "<" and ">" are unsafe because they are used as the delimiters around URLs in free text; the quote mark (""") is used to delimit URLs in some systems. The character "#" is unsafe and should always be encoded because it is used in World Wide Web and in other systems to delimit a URL from a fragment/anchor identifier that might follow it. The character "%" is unsafe because it is used for encodings of other characters. Other characters are unsafe because gateways and other transport agents are known to sometimes modify such characters. These characters are "{", "}", "|", "\", "^", "~", "[", "]", and "`".

All unsafe characters must always be encoded within a URL. For example, the character "#" must be encoded within URLs even in systems that do not normally deal with fragment or anchor identifiers, so that if the URL is copied into another system that does use them, it will not be necessary to change the URL encoding.

EarlyRiser42 commented 2 weeks ago

When trying to access files using a file URL on Windows 11, such as C:Users\admin\Downloads\[path]\test[1].txt, I had to use the encoded version file:///C:/Users/admin/Downloads/%5Bpath%5D/test%5B1%5D.txt instead of file:///C:/Users/admin/Downloads/[path]/test[1].txt.

The same behavior occurred on Ubuntu 22.04 when accessing a file located at home/earlyriser/Downloads/[path]/test[1].txt. I had to use the URL file:///home/earlyriser/Downloads/%5Bpath%5D/test%5B1%5D.txt.

This behavior was also consistent with UNC paths. However, my PR failed four tests because these tests assume that the [ and ] characters do not need to be encoded in file URLs.