Open bmeck opened 3 years ago
@bmeck ... is this a spec issue or an issue with our implementation?
@jasnell our helper function converting between the 2 (URL and path) seems broken
I'd argue this is by design. A path has deduplicated //
separators. If anything I would argue that the file:
URL spec is incorrect in not supporting //
deduplication since there exists no pathing schemes on file systems that actually distinguish this.
@guybedford I think it could be argued in either direction for different reasons, but to me the lossy behavior being only on 1 end is both odd and a bit hard to explain. The real oddity to me here is that if a filesystem were to exist that did support empty filenames (S3 for example supports this) it would be odd to normalize in the runtime rather than leaving it to the OS. Is there a reason to impose a normalization step that might not be universally respected? I agree that this can cause distinct differences since that would cause issues like the one mentioned above with the new WHATWG change, but I think consistency is paramount personally. If we do want to normalize we should probably change url.fileURLToPath
so that it remains consistent. I would heavily prefer to not normalize these as it is reaching into file system path interpretation which isn't something Node typically does except for windows UNC path limitation workarounds.
To be honest I think we've hit a real gap here between the spec and the reality, and siding with the spec only continues to ignore reality.
Specifically, I think we should add a new URL normalization function (fileURLNormalize
or something) to any operation which deals with file URL resolution, that will dedupe repeated /
characters, and possibly do percent encoding deduping in future as well.
I don't see any issue though with this function being lossy. Lossiness is a feature not a bug.
One could even define fileURLNormalize = pathToFileURL(fileURLToPath(url))
actually.
@guybedford given we are now moving to relying on realpath for our default loader, and talking of manual deduping for --preserve-symlinks would it be safe to move forward here? My main goal is that if an OS doesn't remove the //
when realpathing that we can preserve that.
@bmeck there is more than just this deduping at play here - pathToFileURL
can also take relative paths and will normalize them to the cwd. I actually don't know of any path operations in Node.js that don't do some path normalization or resolution, so it would seem this would be the rule in general. It might be a hard time changing the established behaviour, just like it would be for any other fs function.
@guybedford that seems like the internal usage of path.resolve
won't support empty dir names (I agree changing that API is likely dangerous) then; using that internally is not technically required for this fn though. I don't think we should make the URL fileURLToPath normalize out the //
so I'm more interested in changing pathToFileURL
even if it means some extra work like having path.resolve delegate to an internal function that has a flag on if it should drop empty entries.
@bmeck the pathToFileURL
won't normalize out Windows UNC strings though. But it is doing a path.normalize
yes. Just about all path
functions in Node.js do normalization, so I still don't quite follow the logic around why this one should be different here.
@guybedford my goal is to have these 2 complimentary functions do the same kind of normalization. I'm not against some normalization, but altering fileURLToPath to remove //
is a lossy behavior so I don't want that.
@bmeck if the goal is consistency, would you consider the alternative of normalization on the fileURLToPath
output?
@guybedford possibly; but, since //
in pathnames is significant on flat file systems; we could add a note that these functions don't work for some file systems in the docs and that might be fine as well.
That sounds sensible to me.
This issue seems to characterize the bug as related to the spec, but this is not true. new URL('file:///a//b').pathname
gives /a//b
as expected. The real trouble is the fact that we use path.resolve
to sanitize inputs to url.pathToFileURL, and path.resolve deletes empty path parts.
We can either stop using path.resolve
(the URL constructor resolves .. and . segments anyway), or we can document the limitations of this API.
@TimothyGu how are you hitting these limitations currently? It would help to relate the issue to a use case.
I'm not hitting these limitations myself. I just noticed this open issue which seems to say that the WHATWG URL spec causes '/' to be collapsed, while it's really just path.resolve.
Agreed it is separate, I guess I would effectively still argue that the file URL protocol should already handle this (in contrast to non file URL protocols).
Found while checking: https://github.com/nodejs/node/issues/35429
Currently it is not possible to have a path represent the file URL
new URL('file:///a//b')
when going throughurl.pathToFileURL
.url.fileURLToPath
does property preserve the//
in the path already. This causes round trips to be lossy.What steps will reproduce the bug?
How often does it reproduce? Is there a required condition?
Always
What is the expected behavior?
What do you see instead?
Additional information