nodejs / node

Node.js JavaScript runtime ✨🐢🚀✨
https://nodejs.org
Other
106.58k stars 29.06k forks source link

url.pathToFileURL should not dedup '/' in path #35507

Open bmeck opened 3 years ago

bmeck commented 3 years ago

Found while checking: https://github.com/nodejs/node/issues/35429

Currently it is not possible to have a path represent the file URL new URL('file:///a//b') when going through url.pathToFileURL. url.fileURLToPath does property preserve the // in the path already. This causes round trips to be lossy.

What steps will reproduce the bug?

console.log(url.pathToFileURL('/a//b').href);

How often does it reproduce? Is there a required condition?

Always

What is the expected behavior?

file:///a//b

What do you see instead?

file:///a/b

Additional information

jasnell commented 3 years ago

@bmeck ... is this a spec issue or an issue with our implementation?

bmeck commented 3 years ago

@jasnell our helper function converting between the 2 (URL and path) seems broken

guybedford commented 3 years ago

I'd argue this is by design. A path has deduplicated // separators. If anything I would argue that the file: URL spec is incorrect in not supporting // deduplication since there exists no pathing schemes on file systems that actually distinguish this.

bmeck commented 3 years ago

@guybedford I think it could be argued in either direction for different reasons, but to me the lossy behavior being only on 1 end is both odd and a bit hard to explain. The real oddity to me here is that if a filesystem were to exist that did support empty filenames (S3 for example supports this) it would be odd to normalize in the runtime rather than leaving it to the OS. Is there a reason to impose a normalization step that might not be universally respected? I agree that this can cause distinct differences since that would cause issues like the one mentioned above with the new WHATWG change, but I think consistency is paramount personally. If we do want to normalize we should probably change url.fileURLToPath so that it remains consistent. I would heavily prefer to not normalize these as it is reaching into file system path interpretation which isn't something Node typically does except for windows UNC path limitation workarounds.

guybedford commented 3 years ago

To be honest I think we've hit a real gap here between the spec and the reality, and siding with the spec only continues to ignore reality.

guybedford commented 3 years ago

Specifically, I think we should add a new URL normalization function (fileURLNormalize or something) to any operation which deals with file URL resolution, that will dedupe repeated / characters, and possibly do percent encoding deduping in future as well.

I don't see any issue though with this function being lossy. Lossiness is a feature not a bug.

guybedford commented 3 years ago

One could even define fileURLNormalize = pathToFileURL(fileURLToPath(url)) actually.

bmeck commented 3 years ago

@guybedford given we are now moving to relying on realpath for our default loader, and talking of manual deduping for --preserve-symlinks would it be safe to move forward here? My main goal is that if an OS doesn't remove the // when realpathing that we can preserve that.

guybedford commented 3 years ago

@bmeck there is more than just this deduping at play here - pathToFileURL can also take relative paths and will normalize them to the cwd. I actually don't know of any path operations in Node.js that don't do some path normalization or resolution, so it would seem this would be the rule in general. It might be a hard time changing the established behaviour, just like it would be for any other fs function.

bmeck commented 3 years ago

@guybedford that seems like the internal usage of path.resolve won't support empty dir names (I agree changing that API is likely dangerous) then; using that internally is not technically required for this fn though. I don't think we should make the URL fileURLToPath normalize out the // so I'm more interested in changing pathToFileURL even if it means some extra work like having path.resolve delegate to an internal function that has a flag on if it should drop empty entries.

guybedford commented 3 years ago

@bmeck the pathToFileURL won't normalize out Windows UNC strings though. But it is doing a path.normalize yes. Just about all path functions in Node.js do normalization, so I still don't quite follow the logic around why this one should be different here.

bmeck commented 3 years ago

@guybedford my goal is to have these 2 complimentary functions do the same kind of normalization. I'm not against some normalization, but altering fileURLToPath to remove // is a lossy behavior so I don't want that.

guybedford commented 3 years ago

@bmeck if the goal is consistency, would you consider the alternative of normalization on the fileURLToPath output?

bmeck commented 3 years ago

@guybedford possibly; but, since // in pathnames is significant on flat file systems; we could add a note that these functions don't work for some file systems in the docs and that might be fine as well.

guybedford commented 3 years ago

That sounds sensible to me.

TimothyGu commented 3 years ago

This issue seems to characterize the bug as related to the spec, but this is not true. new URL('file:///a//b').pathname gives /a//b as expected. The real trouble is the fact that we use path.resolve to sanitize inputs to url.pathToFileURL, and path.resolve deletes empty path parts.

We can either stop using path.resolve (the URL constructor resolves .. and . segments anyway), or we can document the limitations of this API.

guybedford commented 3 years ago

@TimothyGu how are you hitting these limitations currently? It would help to relate the issue to a use case.

TimothyGu commented 3 years ago

I'm not hitting these limitations myself. I just noticed this open issue which seems to say that the WHATWG URL spec causes '/' to be collapsed, while it's really just path.resolve.

guybedford commented 3 years ago

Agreed it is separate, I guess I would effectively still argue that the file URL protocol should already handle this (in contrast to non file URL protocols).