whatwg / url

URL Standard
https://url.spec.whatwg.org/
Other
531 stars 139 forks source link

Good defaults on URL() #71

Closed LeaVerou closed 8 years ago

LeaVerou commented 9 years ago

Let’s not make the addEventListener() mistake again with the pointless 3rd argument. Web APIs need reasonable defaults. Most uses of relative URLs in the URL() constructor would be against the document URL [citation needed :P] We should define the 2nd argument to be location by default in contexts where that is available. This can be changed without breaking backwards compat.

annevk commented 9 years ago

If you have [citation needed] that would be most welcome. The rationale for doing it this way is that magical defaults are bad.

LeaVerou commented 9 years ago

The rationale for doing it this way is that magical defaults are bad.[citation needed]

There, I fixed it for you.

Good defaults is one of the basic principles of good usability. This is the problem when people with zero HCI training make specs: We get abominations with a bunch of pointless arguments that are almost always the same because someone thought that “magical defaults are bad.”

domenic commented 9 years ago

I much prefer the current system: single-arg version is for absolute URLs, and the two-arg version is for relative URLs. That way I can do things like new URL(userInput) and it will give me an appropriate error if the user entered something like adsfslkdfhfd instead of https://example.com/.

domenic commented 9 years ago

(user input, or input from other parts of the program, or from other libraries, or whatever.)

LeaVerou commented 9 years ago

Actually, I recently wanted to check if a URL was relative or not (not user input) and had to do the following fragile abomination, because I didn’t want to disallow relative URLs so I was providing a location 2nd argument anyway.

return this.url.protocol !== location.protocol ||
       this.url.hostname !== location.hostname ||
       this.url.port     !== location.port;

It would be much more valuable to include a property/method for that (ideally with some degree of granularity, i.e. relative to what?) than hindering usability for most use cases because of that one edge case @domenic thought.

domenic commented 9 years ago

I find it distasteful how you paint everyone who disagrees with you as "zero HCI training", "abominations", "pointless", and "edge case".

annevk commented 9 years ago

That abomination doesn't make sense. You cannot have a constructed URL that's "relative". And I've had HCI training actually. I'd caution against making sweeping statements. It's not very nice.

LeaVerou commented 9 years ago

Feel free to tone police me instead of actually trying to understand what I’m saying, it’s very productive.

annevk commented 9 years ago

I thought I did. I asked you why you thought your case was more common and I pointed out that you cannot have a URL object that is relative (which it seems you are assuming you can have).

I also happen to disagree that for a low-level primitive we should be pulling in state from the global object. It seems much better for this to be a pure function.

KittyGiraudel commented 9 years ago

If you have [citation needed] that would be most welcome.

You want a citation for this? Run a survey. Ask the community. Ask the developers. Ask the vendors. After all, the specifications and the APIs are made for them. You will see that Lea's assumption is correct.

domenic commented 9 years ago

http://c2.com/cgi/wiki?GlobalVariablesAreBad is especially relevant. Non-locality, No Access Control or Constraint Checking, Implicit coupling, and Testing and Confinement being the relevant points.

KittyGiraudel commented 9 years ago

This actually is irrelevant. We are not arguing about whether global variables are good or evil; I think we all know they are usually to avoid.

Still, there is no good reason why URL() is not checking for window.location directly except that magical defaults are bad.

Again, if you want to design an API for the community, ask the community.

domenic commented 9 years ago

It's perfectly relevant. Please read the linked bullet points. It's specifically the coupling of URL to a global state (i.e. the global variable location) that's being proposed here.

LeaVerou commented 9 years ago

I asked you why you thought your case was more common

I thought it was obvious that the [citation needed] I added to my own comment was self-deprecating humor. To get actual data for this, one would need to wait until URL is widely used enough and authors are already complaining, just like what happened with the 3rd argument in addEventListener. I think what I’m saying is pretty much obvious to most authors, but feel free to ask them if you have doubts.

you cannot have a URL object that is relative (which it seems you are assuming you can have).

You can have a URL object that could have been produced from a relative URL (whether it actually was or not), and that is useful to detect because detecting today is rather messy.

I also happen to disagree that for a low-level primitive we should be pulling in state from the global object. It seems much better for this to be a pure function.

The W3C priority of constituencies puts theoretical purity at the very bottom. Feel free to disagree with W3C all you want, but I think that particular bit about priorities is very well thought out and I wish it was followed more (authors above implementors, yeah right :/).

annevk commented 9 years ago

You can have a URL object that could have been produced from a relative URL

Only if part of the input was a base URL (which is complete). I don't see how you think you can detect where the URL object came from, such state is not kept around.

LeaVerou commented 9 years ago

Only if part of the input was a base URL (which is complete). I don't see how you think you can detect where the URL object came from, such state is not kept around.

I’m referring to the kind of thing the snippet I posted above tries to do, albeit messily. Feel free to label it with whatever kind of terminology you want.

slightlyoff commented 9 years ago

Unsurprisingly, Lea's point is strong; I've likewise had a forehead-slapping moment over this. Code on the web runs at origins. The current origin can always be known and should be the default base if none is provided.

Domenic: if you want a separate behaviour for known relative/absolute, create less error-prone statics on URL that only deal in one or the other.

annevk commented 9 years ago

I don't understand that snippet since it lacks context. The only thing you can conclude from that is that the URLs are not the same.

wanderview commented 9 years ago

FWIW, I asked Lea to file this.

I tend to agree that all scripts in the browser have an origin, so we could infer a base URL. Providing this as a default seems reasonable to me.

Is there a default origin concept in node.js if code there wanted to use URL objects? Other javascript environments without a default origin concept seem the biggest concern to me. (But maybe thats not relevant for whatwg?)

julienw commented 9 years ago

Maybe a sane argument is that using a "a" element actually uses the current location:

var link = document.createElement('a');
link.href = 'something';
link.href; // https://github.com/whatwg/url/issues/something"

That said, I strongly prefer the current behavior of URL. If we want another behavior, we could come up with a new RelativeURL API. That API would not be resolved until we ask for it:

var url = new RelativeURL('something');
url.resolve() // resolve to the current location, like <a> above
url.resolve(location); // idem
url.resolve('http://google.fr'); // resolve to another absolute URL.
url.resolve(absoluteUrl); // resolve to an existing URL object
WebReflection commented 9 years ago

if this works

function toAbsolute(url){var a=document.createElement('a');a.href=url;return a.href}

I think Lea expectations are the least surprise about the default.

[edit] yeah, had similar thoughts @julienw had, beside I usually prefer smarter defaults and this seems like a good example

LeaVerou commented 9 years ago

@WebReflection Yup, before URL that’s exactly what I and many other authors did to resolve and parse URLs…

domenic commented 9 years ago

Yeah if you want an API that deals with location IMO it should be on location. I am thinking location.relativeURL(...). You could even put it on URLUtils so any URL can do this.

Again, decrease coupling between disparate parts of the system. There's the low-level part, URL, and the higher-level user of it, location. The lower-level part is reusable, can work in any environment, and can be adapted to any situation.

There's also the point that it's entirely unclear what the default should be here. Origin, as some in this thread are advocating? Document's current URL, as the OP advocates? document base URL, as <a> and <area> do? Or maybe even document's creation URL?

sleevi commented 9 years ago

Chiming in from the sidelines of helping maintain Chrome's URL parsing implementation, and on the native side, I regularly catch bugs during code review where someone is handling an invalid URL but gets resolved to a relative URL.

While the URL spec will hopefully work to clarify many of the edge cases and ambiguities that exist, I think it's a valid concern that 'invalid URL' get surfaced to the user, rather than 'invalid URL interpreted as relative'. I don't think the URL spec (or our implementation) are at a good point to universally meet that latter concern, so the shape of the API necessarily reflects the shape of the spec.

tabatkins commented 9 years ago

Note that URL() is also meant to be usable in contexts like Node, where there is no associated origin.

I agree with a few others that handling relative urls explicitly with a separate interface is probably best, if we do want to handle relative URLs. The URL interface is not well-suited to this task, because it absolutizes immediately. (And I think this is valuable - relative urls are confusing to work with if you pass them around between contexts; a relative url produced in a document and then written into a CSS stylesheet can have an unexpected result!)

WebReflection commented 9 years ago

if we'd like to compare this on client VS server, I think everyone would expect a relative path to be resolved from the execution directory (as it is already in node), hence the fact here we are talking about least surprise.

I personally don't have strong feelings about this API choice but I think there are more pros than cons in having a smarter relative default here. If not too late, maybe worth considering.

tabatkins commented 9 years ago

@WebReflection OMG, that is absolutely not what I woudl expect. Resolving a url against the file location is crazypants and almost certainly totally wrong.

mgol commented 9 years ago

I agree with @tabatkins, @domenic & @annevk here; the fact that new URL() is pure isn't just a theoretical advantage, it makes it possible to use it outside of the browser environment; also, making URL rely on a separate interface would be surprising, if anything.

Methods relying on location state should be defined under location as @domenic suggested. This would be a useful interface but a separate one at that.

mgol commented 9 years ago

Also, the addEventListener third argument (even if we avoid the problem that it's a Boolean trap) is a constant, it doesn't refer to anything outside of the core language. Referring to location would be different.

WebReflection commented 9 years ago

That is NOT what I meant indeed, and I wouldn't suggest that. I was rather talking about path module resolution. On 9 Sep 2015 5:53 pm, "Tab Atkins Jr." notifications@github.com wrote:

@WebReflection https://github.com/WebReflection OMG, that is absolutely not what I woudl expect. Resolving a url against the file location is crazypants and almost certainly totally wrong.

— Reply to this email directly or view it on GitHub https://github.com/whatwg/url/issues/71#issuecomment-138973236.

tabatkins commented 9 years ago

@WebReflection Ah, kk. That's a rather different thing then - that's resolving a file path (where the default is relative paths in my experience) rather than a URL.

tabatkins commented 9 years ago

And file paths don't have the "what context?" problem that URLs do. You're rarely passing file paths between computers, because it's obviously wrong; if you do, it's only when there's a well-known standard file structure you're operating on.

As @domenic said, tho, even within the document context, there are multiple "default" origins you could select - the page origin, the document location, the document base url. Then throw in CSS stylesheets, which resolve relative urls against the stylesheet location. There simply isn't a single correct answer, and it'll be easy to accidentally take a relative url meant for one context and use it in another. Then, as @sleevi said, it's also pretty easy to accidentally have an invalid url, and having that fail early rather than being interpreted as a nonsensical relative URL is good for authors.

There's lots of reasons why requiring the author to be explicit about (a) wanting to handle relative URLs and (b) which base url to resolve it against, is a good thing. The only reason for defaulting it is to save 9 characters (,location) on the assumption that the location object is the correct default.

LeaVerou commented 9 years ago

Note that URL() is also meant to be usable in contexts like Node, where there is no associated origin.

I agree with @tabatkins, @domenic & @annevk here; the fact that new URL() is pure isn't just a theoretical advantage, it makes it possible to use it outside of the browser environment; also, making URL rely on a separate interface would be surprising, if anything.

In the original post, I wrote “in contexts where that is available”. It wouldn’t be the first thing that is available in a browser context but not in Node.

LeaVerou commented 9 years ago

@tabatkins Using the location as the default is consistent with what document.createElement('a') does, as mentioned above, hence it’s a more predictable default.

sleevi commented 9 years ago

It wouldn’t be the first thing that is available in a browser context but not in Node.

But wouldn't it create an API that behaved differently, and in subtle ways that can cause real bugs? In the principle of least surprise, you'd have otherwise perfectly valid code that lead to inconsistent or unexpected results depending on context. That's not just theoretical purity - that's a real usability problem.

mikeal commented 9 years ago

Since when is using an existing global value when one is not passed a "magic default?" By this logic virtually any default is magic.

tabatkins commented 9 years ago

@LeaVerou That's what happens when you create an <a> not in the document, yeah. And when it's in the document, it uses the document's base url. And neither is the origin, which is what @slightlyoff was talking about defaulting with.

You can't auto-resolve a relative url against location and expect it to be "what it would resolve to if this was a link in the page", because <base>. You can't expect it to be "what it would resolve to if this was in a CSS url() function", because that's the stylesheet's url. You can't input what you assume is an absolute url and get what you expect, if the string is user input and might be nonsense (it'll get interpreted as a nonsense relative url). Auto-assuming the base is a footgun, and it'll bite people. You need to specify what your base is, because there are multiple valid choices, and it's a valid choice to say "none" because you're expecting an absolute url.

(For a "must be absolute" use-case, assume you're setting your avatar, and you can upload from your computer or provide a url. Relative urls are nonsense here; even if you did want, for some reason, to use a picture on the domain, it's easiest to right-click=>copy link address, which gives you an absolute url. If someone does insert something that's not an absolute url, it's very likely to be an error, and throwing is good behavior here.)

tabatkins commented 9 years ago

Actually, this is even worse. http://software.hixie.ch/utilities/js/live-dom-viewer/saved/3631, check the log at the bottom.

An <a> always pays attention to the <base>, whether it's in-document or not, but that has no relation to location at all. So defaulting to location would be even worse. But in script, there's no reason to assume <base> is significant!

gsnedders commented 9 years ago

I'd expect a default of the document base URL, FWIW. location would surprise me given its inconsistent.

tabatkins commented 9 years ago

An easy future path: put relative URL resolvers on anything with a base url concept. document.resolveURL(), location.resolveURL(), document.styleSheets[0].resolveURL(), etc. (Plus the explicit "use this base url" on the URL interface itself.) Just make a mixin interface and use it on everything with a base url concept.

This makes relative URLs 100% obvious, while providing all the easy defaults so authors don't have to figure out how to special-case construct the right base URL in each circumstance.

annevk commented 9 years ago

@mikeal I recommend not focusing on my first comment, it wasn't well thought out.

@tabatkins what exactly would these methods do/return? Return new URL objects is the input was parsed against the URL represented by the object the method was invoked on?

tabatkins commented 9 years ago

@annevk Right. The point would be auto-providing the "correct" base url from each context. The base urls are provided with seemingly random API: window.location, document.baseURI, stylesheet.href || stylesheet.ownerNode.baseURI (it varies whether the stylesheet is in a <link> or <style>).

Providing resolveURL() seems slightly better than providing "yet another way to expose the base url" for all of the contexts. (And .resolveURL("") provides the current base url anyway.)

hax commented 9 years ago

If you want some community feedback, I can represent my team and we agree that magic default is bad, use global state as magic default is worse! My team is moving to ES6 modules, and do not allow any global state in the code of modules, even document or window should be sent explicitly to the functions of modules. -- we also abandon jQuery, there are many considerations but one of it is $ API use a magic default.

BTW, about addEventListener, I think the main problem is its tedious name. Most library (and node.js api) use on(type, listener) now. The second problem is "boolean params are wrong" (unfortunately, old DOM APIs are terrible, especially initEvent), a better solution is provide a separate capture(type, listener) API.

tabatkins commented 9 years ago

I don't know why I didn't think of this before, but for the several people (including me) saying "magic default doesn't work for Node", it also doesn't work for several other things actually in the web platform: namely, Shared Workers and Service Workers. Both of those have good reasons to do things with URLs, and thus want to use the URL interface, but neither has a document or a window associated with itself, and so there's no way to magically default them.

Large behavior differences in slightly different contexts are footguns. (This is why purity is not just a "theoretical purity" concern - it's a real author-usability concern, too.)

WebReflection commented 9 years ago

it's clear this won't ever happen but I feel like somebody is stressing a bit too much this decision.

First of all, replace all "magic default" mention with "least surprise", secondly, back to the path mention I've quickly explained before: if on any Operating System I write ls or dir I don't have to specify . and the list of current files and folders are shown.

This AFAIK has been working forever and nobody complained. You can pass the . to explicitly specify the current folder, or you can pass any ther path or you can pass nothing, which defaults to the current folder.

How difficult is this basic fallback to understand or agree on ... I don't know, and since I've already said I don't personally have any strong opinion about this matter, I don't even care, but surely we can stop pretending everyone expecting a default with the most common case, which is on a document/browser/window and with an available location, doesn't know what his/her is talking about.

We can stop talking purity, etc etc ... there are reasons not to have this defaulting nicely to avoid surprises? Yes? Cool, we're good. No? Then let's think about it, specially why there is such a thread about it in the first place.

Regards

tabatkins commented 9 years ago

I already talked about file paths, and how they're not directly comparable to urls, particularly the concept of relative urls within a page, in https://github.com/whatwg/url/issues/71#issuecomment-138975720 and https://github.com/whatwg/url/issues/71#issuecomment-138977373.

WebReflection commented 9 years ago

urls are paths for the web. I've already talked about expectations from developers and least surprise. Let's agree to disagree and move on? Like I've said, I don't care ... like ... really!

tabatkins commented 9 years ago

Again, the two are not directly comparable, as the context surrounding them is very different, and it's the context around relative urls that is the problem here, as I said in the comments linked above.

If you don't care, stop responding to the issue. ^_^

WebReflection commented 9 years ago

I cared until I've realised it's pointless here ... so I personally think you should close this bug as won't fix (instead of commenting what purity is and stuff) so we can all move on. Regards

melvyn-sopacua commented 8 years ago

Hmm, I still care. One of the objections seems to be that in a contextless environment this cannot be resolved. In the bug that I posted, it was said that JavaScript doesn't have a MissingArgument exception and yet this thread is filled with purity objections (as far as I know, JavaScript exceptions are extendable). The exception thrown (also with a parser error) should be the equivalent of python's ValueError. Type is only relevant when passing non-string objects. So if Node.js has no way of resolving relative URLs, then this should probably be in the lines of NotImplemented / Unsupported / WhatHaveYou. But as it stands, the URL() object is not capable of handling input a user expects it to handle. The creation of RelativeURL for this, should then change URL to AbsoluteURL and URL should probably be an interface defining all methods but the constructor, so we can even create FilteredURL() for a soho proxy in node or privacy extension in the browser, a SecureProtocolURL(), which always returns the secure version of provided scheme, etc etc. But note that splitting this into two objects, requires me to do partial parsing, which is exactly what I was looking to avoid and thought URL() was supposed to do. Or I catch the exception and start guessing what exactly the cause of the exception is (as there are three entirely different ones under the same umbrella and no way to determine programmatically which). As for the sane default: The calling script's script element's ownerDocument location (so run up the tree, not reference a global, solved that too). Authors will understand that this is the only sensible default and will work for things like validating document resources in a user script (which is what triggered my issue) or grabbing all resource URLs for statistical purposes. It is also very easy to understand from this perspective what the exceptions are going to be.

So my summary: