whatwg / url

URL Standard
https://url.spec.whatwg.org/
Other
527 stars 137 forks source link

Differentiate from zero-sized fragment and no fragment in url #779

Open lu-zero opened 1 year ago

lu-zero commented 1 year ago
scheme://host:port/#

and

scheme://host:port/

if fed to the URL do not distinguish between the two: URL.hash returns ''

and to make it even stranger passing .hash = '#' produces scheme://host:port/# but calling .hash returns '' nonetheless.

would be nicer if .hash returns undefined/null if it is unset or "#" if the trailing hash is present.

annevk commented 1 year ago

We cannot change the existing API, but I'm somewhat supportive of adding API surface for this as it is indeed hidden information. For search too. (hasSearch & hasHash seem more palatable.)

Having said that, is there evidence on Stack Overflow or in popular JS libraries that this is a shortcoming people have to work around?

lu-zero commented 1 year ago

I found the problem while looking at how the url fragment is supported across languages while working at another standard, so I cannot tell you how widespread this need is within JS, I guess we'll have to make a note and signal the pitfall.

What is surprising me even more is that you do not get what you set.

let url = new URL("scheme://host/path/");
console.log(url.hash);
url.hash = "#";
console.log(url.toString()); // -> scheme://host/path/#
console.log(url.hash); // -> ''
url.hash = "#a";
console.log(url.toString()); // -> scheme://host/path/#a
console.log(url.hash); // -> '#a'
karwa commented 1 year ago

I agree that this part of the JS URL API is awkward. To give another data point: in my library WebURL, which implements the WHATWG standard in Swift, I made this change ("not present" is communicated as nil, not as an empty string) and some other tweaks.

WebURL uses nil to signal that a value is not present, rather than an empty string. This is a more accurate description of components which keep their delimiter even when empty. For example, consider the following URLs:

http://example.com/ http://example.com/?

According to the URL Standard, these URLs are different; however, JavaScript’s search property returns an empty string for both. In fact, these URLs return identical values for every component in JS, and yet still the overall URLs compare as not equal to each other. This has some subtle secondary effects, such as url.search = url.search potentially changing the URL.

WebURL avoids this by saying that the first URL has a nil query (to mean “not present”), and the latter has an empty query. This has the nice property that every unique URL has a unique combination of URL components.

I appreciate that the JS API cannot be changed at this point, though.

alwinb commented 8 months ago

Host has this problem too.

There is this classic post according to which query and fragment have been in use fairly consistently to refer to the search without the ? sigil and the hash without the # sigil.

So one option is to fix search and hash and make them available as query and fragment instead. The search and hash getters / setters can then be marked as legacy or deprecated (but not removed).

pimterry commented 2 weeks ago

Having said that, is there evidence on Stack Overflow or in popular JS libraries that this is a shortcoming people have to work around?

I've run into this problem myself, in multiple projects and libraries, in both Node & browsers.

Right now I'm building developer tools, where URLs are taken as string input, parsed, and manipulated by component, and preserving the raw formatting where possible is useful. Not being able to differentiate between /? and / and the end of a URL is quite inconvenient! I'm still using Node's url.parse in some places in part because it does not have this behaviour and that's important.

Of course this state does exist within the URL parser (the URL's internal query and fragment states in the spec do store empty & null differently) but it's just not currently exposed the same way in search & hash (in both cases, both null and empty are exposed as '').

Totally understand that changing the existing API is impractical. Either of the options proposed here so far would work well in scenarios like mine: