whatwg / url

URL Standard
https://url.spec.whatwg.org/
Other
532 stars 140 forks source link

A '@' character in the host part of file URLs #805

Open hayatoito opened 10 months ago

hayatoito commented 10 months ago

(Reported in https://crbug.com/1502849)

It appears that Windows uses file URLs with '@' (U+0040) characters in their host parts, such as file://webdavserver.net@ssl/a.pdf.

However, according to my understanding, file://webdavserver.net@ssl/a.pdf is an invalid URL in the URL Standard because '@' is considered a forbidden host code point.

To ensure compatibility with Windows file URLs, should we consider allowing the '@' character in the host part of file URLs?

I'd appreciate hearing opinions of the URL Standard folks on this matter.

annevk commented 10 months ago

It seems reasonable to allow, but I wonder if it would be possible for Chromium to determine the complete set of changes needed for it to not have platform-divergent behavior. At least I suspect that making them all at once would allow for an easier rollout.

karwa commented 10 months ago

A single @ in the authority section (username, password, hostname, port) generally delimits the credentials from the hostname.

Let's take any other URL scheme, e.g. HTTP: http://webdavserver.net@ssl/

Which is clearly not what the reporter wants to happen.

What's more, this has been the accepted interpretation for at least the last 30 years (going back to RFC-1738). I doubt many URL parsers are going to interpret file://webdavserver.net@ssl/ as having a hostname containing an @ sign, so the output of the URL parser must keep the @ escaped in order to properly encode its understanding of the URL components. file://webdavserver.net%40ssl/ is semantically correct.

I think the actual problem is that hostnames in file URLs are not able to contain percent-encoding. I looked in to this in depth a while back, and found that:

See https://github.com/whatwg/url/issues/599

catmanjan commented 2 weeks ago

Please remove @ from the forbidden host code point list!

Given rfc1738 it was probably a mistake that it was there in the first place.

Just on this:

I doubt many URL parsers are going to interpret file://webdavserver.net@ssl/ as having a hostname containing an @ sign

Firefox, Safari, windows explorer, linux terminals all handle the URL fine, in fact its only chromium based browsers that have the issue because they want to use the URL standard as their only authority, rather than use multiple path standards...

valenting commented 2 weeks ago

I doubt many URL parsers are going to interpret file://webdavserver.net@ssl/ as having a hostname containing an @ sign

Firefox, Safari, windows explorer, linux terminals all handle the URL fine

Safari also rejects this URL, and the only reason it works in Firefox is that we currently ignore everything in the hostname part of a file URL (tracked in 1507354 - URL parser discards host for file URLs

Allowing @ only in the authority section of file URLs seems like a weird exception to make. I'm in favor of keeping hostname parsing as close to the HTTP url parser as possible - and here the @ sign should probably be percent encoded.

catmanjan commented 2 weeks ago

@valenting yes I think the problem is calling them file URLs, they are URL like but ultimately the OP (file://webdavserver.net@ssl/a.pdf) is a UNC file path, and currently its just a coincidence that Chromium works for most of them...