Closed jasnell closed 7 years ago
@nodejs/collaborators
Perhaps we could raise interest in this by providing examples of failing tests.
@seishun ... https://github.com/nodejs/node/blob/master/test/known_issues/test-url-parse-conformance.js :-)
We currently fail somewhere around 140+ of the test cases in the WhatWG set.
+1
Considering that the browser are exposing the global I think it makes a whole bunch of sense. I'd be interested in helping out with this. Have you broken ground on implementation @jasnell?
Can we borrow from implementations at all?
@TheAlphaNerd ... yeah, I've got it mostly implemented already. The next step is to start running it through it's paces with tests and benchmarks and to find ways of optimizing the implementation. It's currently quite a bit slower than the existing require('url')
module parsing.
Regarding borrowing from other impls, it's entirely possible that we could borrow from chromes implementation. I'm not sure yet if theirs is a pure JS impl or not. I'll look into that.
@jasnell How about the following 2 static methods which are defined at standard IDL:
domainToASCII(domain)
domainToUnicode(domain)
I didn't see those in the proposal :-(
Still considering those. They are easy enough to implement given the punycode module but I'm not sure how extensively they're used. On Jun 1, 2016 9:40 AM, "Yorkie Liu" notifications@github.com wrote:
@jasnell https://github.com/jasnell How about the following 2 static methods which is defined at IDL:
- domainToASCII(domain)
- domainToUnicode(domain)
I didn't see those in the proposal :-(
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/nodejs/node-eps/pull/28#issuecomment-223052044, or mute the thread https://github.com/notifications/unsubscribe/AAa2eVA3NG0j4lEZnCUSro0RREhVL5wBks5qHbXqgaJpZM4IrtaC .
There will also be other differences. For instance, I'm not sure if we need the host parsing component. On Jun 1, 2016 9:42 AM, "James M Snell" jasnell@gmail.com wrote:
Still considering those. They are easy enough to implement given the punycode module but I'm not sure how extensively they're used. On Jun 1, 2016 9:40 AM, "Yorkie Liu" notifications@github.com wrote:
@jasnell https://github.com/jasnell How about the following 2 static methods which is defined at IDL:
- domainToASCII(domain)
- domainToUnicode(domain)
I didn't see those in the proposal :-(
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/nodejs/node-eps/pull/28#issuecomment-223052044, or mute the thread https://github.com/notifications/unsubscribe/AAa2eVA3NG0j4lEZnCUSro0RREhVL5wBks5qHbXqgaJpZM4IrtaC .
I'm not sure yet if theirs is a pure JS impl or not.
There is https://github.com/jsdom/whatwg-url
cc @domenic
Yeah, I'm familiar with (and use) whatwg-url
. @domenic, how would you feel about the possibility of pulling the whatwg-url implementation into core? I'm not yet that familiar with how it is implemented internally but having a url standard compliant url parser built into core would be very good.
Yep, that's @Sebmaster's most excellent work. It's not super optimized, but would be a good starting point.
This thread is very exciting and I'm glad there is an appetite for the idea!! The idea of a global, the same as in browsers, is great.
I think there are two separable problems here:
require("url")
in a semver-major bump, as mostly it will change edge cases.url.pathname = "\uDC01"
, then that will initiate a parse in the pathname override state, and then if you trigger the getter, you'll find url.pathname === "/%EF%BF%BD"
. Note how this both appends the leading slash, and also performs the USVString surrogate pair censoring (decodeURIComponent("%EF%BF%BD").charCodeAt(0) === 0xFFFD
, not 0xDC01
). This is a fairly important property IMO.There are tests of the URL Standard at https://github.com/w3c/web-platform-tests/tree/master/url, and whatwg-url has a runner. The coverage is pretty reasonable; see https://github.com/w3c/web-platform-tests/issues/3018
@domenic, how would you feel about the possibility of pulling the whatwg-url implementation into core?
Oh, that'd be very cool! I guess my only concern is that we weren't concerned with speed when writing it, so there is probably lots of low-hanging fruit for performance improvements.
You'd also need to do a bit of work to decouple it from webidl-conversions and webidl2js. If you npm install
it you'll find that it follows the same impl/wrapper strategy as browsers do, where there's a "wrapper" that takes care of USVString conversion and so on, delegating to the "impl" where parsing occurs, which in turn delegates to the URL state machine code. At least one of these layers could be disintermediated, although perhaps benchmarking should be done to see if that's actually the area where most improvement could be made.
Ok, I'll dig in and explore the whatwg-url internals and see what can be done reasonably. Before getting too deep into this I'd definitely like to get more +1's from collaborators tho. I'd really like to see this happen tho.
I think there are two separable problems here
Definitely agree that it's worth separating these. Not sure about modifying the existing require('url')
too much tho -- ideally once the global URL
is there for a while we could simply deprecate require('url')
(and possibly even require('querystring')
) entirely with an eye towards reducing the Node.js-specific API surface area.
Well, you can have a 👍 from me! More browser/server unity would be great. 😺
BTW, I like the idea of the global, since that's what browsers do, but I'd recommend not exposing it in require('url')
because then people might get used to it being there rather than available globally, making future removal of the module more difficult.
The only concern I would have with that, @Qard, is that if there is existing code that does URL = whatever
, there would be no way of recovering the original global. That should be a limited edge case, however.
This is not an argument for or against, but a request for more background. You've described what work you would like to do, and how you would do it — but I don't see much in the way of "why" we'd want to do this. What is not working about the status quo that we will rectify by adding a new global object? What value do Node users gain, concretely, from the URL object?
Maybe store it on process.URL
for safe keeping?
@chrisdickinson personally I see quite a bit of benefit in minimizing the delta between node + the browser as far as utility API's like this are concerned.
@chrisdickinson ... the why is straightforward: Currently Node.js' URL parsing has a number of issues in terms of not following the standardized behavior implemented by browsers. Examples of those failures can be seen in the test case I referenced here. There are also differences in the Node.js provided API that are largely unnecessary. This work would give us an opportunity to not only provide more robust URL parsing, but to provide a unified, non-Node.js specific API.
@Qard ... that's certainly a possibility
FWIW, looks like I mis-remembered the number of failures ;-) ... here's the exact count:
bash-3.2$ ./node test/known_issues/test-url-parse-conformance.js
Unknown globals: URL
assert.js:90
throw new assert.AssertionError({
^
AssertionError: 160 failed tests (out of 352)
at Object.<anonymous> (/Users/james/Node/main/node/test/known_issues/test-url-parse-conformance.js:57:8)
at Module._compile (module.js:541:32)
at Object.Module._extensions..js (module.js:550:10)
at Module.load (module.js:458:32)
at tryModuleLoad (module.js:417:12)
at Function.Module._load (module.js:409:3)
at Function.Module.runMain (module.js:575:10)
at startup (node.js:152:18)
at node.js:449:3
bash-3.2$
My general dislike of constructors aside, I have to second @chrisdickinson's concern here. It's not clear to me why we'd want this as a global. Sure, that's how browsers implement it, but Node is not a browser, and browser APIs have historically been designed with a fundamentally different model in mind (namely, the lack of CommonJS).
This proposal, in its current form, is starting to look like it'll drag Node in the direction of PHP - endlessly adding APIs and not really trying to enforce or encourage consistency. As it is, this will just further confuse users as to whether they should be require()
ing a module or expecting a global to be there - not to mention it being unclear whether they should use the url
module or the URL
global. Bundling tools already take care of adding in 'fake' modules that point at browser APIs, so what exactly are we gaining here by trying to make it "more like browser APIs"?
As a separate question, how - if at all - does this cover the "insert object of URL components, receive stringified URL" usecase?
EDIT: Further question: Are there any major concerns about just fixing the existing url
module in a next major (breaking) release? This would seem preferable to me, but I don't know if there are any major roadblocks that I might not be aware of, or whether this has been discussed somewhere before.
Let's separate the concerns just a bit. We could implement this but not make it a global, and just have it accessible via the existing url
module (e.g. const URL = require('url').URL
). We could go this route and still eventually deprecate the Node.js specific API. I would be ok with going that route. This proposes it as a global to be consistent with browsers, but that's not critical if we cannot get consensus on it.
Second, on the point:
endlessly adding APIs and not really trying to enforce or encourage consistency
My goal here is to eventually be able to fully deprecate the existing url
and querystring
modules with the hope of reducing the Node.js specific API surface area. Obviously that's not something that would happen quickly, tho, so the concern is definitely noted.
does this cover the "insert object of URL components, receive stringified URL" usecase?
It would in-so-far as the URL
object as defined by the WHATWG spec includes toString()
and the href
property to provide serialization of the object. If what you're referring to is the ability to create a non-URL
object and serializing it, that's not something that's currently supported by the WHATWG spec but it's something that can be easily maintained.
We could implement this but not make it a global, and just have it accessible via the existing url module (e.g. const URL = require('url').URL). We could go this route and still eventually deprecate the Node.js specific API. I would be ok with going that route. This proposes it as a global to be consistent with browsers, but that's not critical if we cannot get consensus on it.
That seems like a more workable solution to me. There would still likely be short-term user confusion while both APIs exist, however, so this would be something that'd require very clear documentation. I realize that you might've missed the question I added to my previous post later on - could you have a look at that suggestion as well?
My goal here is to eventually be able to fully deprecate the existing url and querystring modules with the hope of reducing the Node.js specific API surface area. Obviously that's not something that would happen quickly, tho, so the concern is definitely noted.
I understand that, but I still don't really see the value in trying to bring the Node.js API in line with browser APIs, as opposed to just keeping it internally consistent (given that Node.js was designed with CommonJS in mind, and switching to globals for everything would be a step back). In the end, Node.js is its own environment, and I can't see how trying to reduce the API to the lowest common denominator would benefit usability.
If what you're referring to is the ability to create a non-URL object a serializing it, that's not something that's currently supported by the WHATWG spec but it's something that can be easily maintained.
That's probably what I mean, yeah - I'm thinking of something along the lines of the url.format
method, that can just directly accept a POJO.
I believe it is valuable to implement the WhatWG URL Standard, but I don't understand why you would implement it as a global instead of simply fixing the url
module.
IMO @joepie91's concern is valid; I think we should stick to the existing accepted behavior. There's a reason Node.js is so easy to write, and that's because (in the vast majority of cases I have experienced), it makes sense, and it makes sense because it's consistent.
@jasnell
This is the complete list of current properties of the global object that were not defined in the ECMA specs:
global
,process
,GLOBAL
, root
— hard-deprecated aliases for global
,Buffer
,clearImmediate
, clearInterval
, clearTimeout
, setImmediate
, setInterval
, setTimeout
,console
.Almost all of those are there for a very good reason, and adding more should be considered very thoughtfully. I don't think that URL
justifies that. We could add fetch
next. And XMLHttpRequest
. And so on — the list would be of indefinite length, once we start.
Upd: for comparison, Chrome has more than 700 of those, not counting the ones defined by ECMA specs.
That said, I'm +1 on the idea of bringing URL
to Node.js and perhaps eventually replacing Node.js specific api with it. And I think it's a great idea.
Not sure about exposing URL
on require('url')
, though, if the long-term plan is to deprecate url
. Perhaps a different module name?
Ok, I think the building trend is to avoid the introduction of a new global, and that is fine. I'll pull that from the proposal. It can be revisited later on if necessary.
instead of simply fixing the url module
My primary concern here is backwards compatibility. I don't simply want to modify the existing url
module because doing so would likely breaking existing code. Also, fixing the existing code to be compliant with the WHATWG spec would be roughly the same amount of work as simply doing a clean new implementation. By implementing the URL
object as a parallel API, we can provide a clear transition path off the Node.js specific API and onto the standard API, then fully deprecate and eventually remove the existing API.
Done properly, the impact to existing code should be as minimal as possible, and if we are not introducing this as a global, then we would need to, at the very least, keep the require('url')
module but eventually deprecate the Node.js specific APIs exposed by it. e.g.:
// deprecated APIs
const url = require('url');
url.parse();
url.format();
url.resolve();
// new API
const URL = require('url').URL;
new URL(url, base)
URL.format(urlLikeObject)
@jasnell Alright, I'm more than comfortable with that.
In fact, thinking about it further, we may be able to simplify the transition even more by simply having the url
module directly export the new URL
object and hanging the deprecated parse()
and resolve()
methods as statics off of it. Existing code should be unaffected.
const URL = require('url');
new URL(url, base);
URL.format(urlLikeObject);
// deprecated API
URL.parse(); // the existing parse method
URL.resolve(); // the existing resolve method
(Anyone currently doing const url = require('url')
would see no real difference)
I'd caution against extending the standard URL with nonstandard methods, if the goal is to be able to help write cross-environment code. That includes .format
.
@domenic ... noted ... then perhaps simply
const url = require('url');
// deprecated APIs
url.parse();
url.resolve();
// retained existing API
url.format(urlLikeObject);
// new API
const URL = require('url').URL;
new URL(url, base);
Updated based on the feedback
@jasnell:
... the why is straightforward: Currently Node.js' URL parsing has a number of issues in terms of not following the standardized behavior implemented by browsers. Examples of those failures can be seen in the test case I referenced here. There are also differences in the Node.js provided API that are largely unnecessary. This work would give us an opportunity to not only provide more robust URL parsing, but to provide a unified, non-Node.js specific API.
Have the differences between WHATWG's URL spec and Node's caused errors for users? Issue links would help give the proposal concrete grounding.
My goal here is to eventually be able to fully deprecate the existing url and querystring modules with the hope of reducing the Node.js specific API surface area. Obviously that's not something that would happen quickly, tho, so the concern is definitely noted.
This is notably something that we've tried before and have not been successful at; both in the larger sense of deprecation and API surface reduction, and in the specific sense that we've tried to replace the URL subsystem with a more spec-compliant version. What is different about this approach that we should expect a different outcome than before?
@chrisdickinson ... quick survey of URL related issues/prs ...
The difference with this approach is that the existing url
parse would not be replaced outright. To support existing users, it would be maintained and soft-deprecated at first, then hard deprecated in the next major beyond that, giving a clear transition path. Also, given that this moves towards using an API that many js developers are already familiar with, there is increased incentive to change.
Thanks for the links.
Assuming it makes sense for us to address the problems folks are having with the url
module, who maintains the new URL parser? Is this something that we will vendor from chromium? If so, how quickly does it change, & how easy is it to pull new versions? In the worst case, are we stuck maintaining two URL parsers? In the best case, do we only have the deprecated url parser, or no url parser?
The difference with this approach is that the existing url parse would not be replaced outright. To support existing users, it would be maintained and soft-deprecated at first, then hard deprecated in the next major beyond that, giving a clear transition path.
If Node v9 came out with a hard deprecation (printed warning) on all url
module usage, do you imagine we'd see less url
use, or less adoption of Node v9?
I agree with @TheAlphaNerd that there's definitely value in reducing the differences between browser APIs and Node APIs, but I want us to be clear about what we're trying to solve, why we're trying to solve it, and how much we think it'll cost us.
Spec adherence for spec adherence's sake has some value, but not necessarily enough value to justify moving the ecosystem's cheese. Spec adherence as a side-effect of solving concrete problems our users are facing it has much more value. Does it have enough value for us to justify potentially maintaining two URL parsers into the future?
For my part, I lean towards "yes", but I'd prefer that information to be in the proposal so others can make that evaluation & refine our collective predictions about maintenance cost.
@chrisdickinson ... the implementation will most likely be a mix of green-field and stuff adopted/borrowed from the whatwg-url module. Now that I've had a chance to dig into the details of that module's code, there is quite a bit that would need to be optimized to get the necessary performance. What I would likely do is create a fork of that module and begin working on various optimizations. From there it can be adapted into core. Done, properly, the changes made to optimize performance could go back into the whatwg-url module also, allowing us to easily pull any modifications that are made there back in.
-1 for global, and I don't like the idea of a constructor. For quite a while I've wanted to simply change node's existing parser to match the WhatWG spec, and release it on a semver-major. Not because I give a crap about complying with the browser, but because this feature has enough edge cases where it could be confused to be the same based on only a sample of URLs. Path of least surprise and all. If that's a no go then I can live with require('url').URL
.
The path of least surprise would be to use the constructor since that's exactly how things work on the browser side and is exactly what the WHATWG spec defines.
Given enough time we could replace the require('url').parse()
method to
follow the more compliant parsing but part of the while point of doing
things the way I've proposed is to avoid the semver-major breaking change
in the near term in favor of a more incremental approach (that is, add
URL
in v6; soft deprecate parse()
in v7; with hard deprecation in v8).
On Jun 1, 2016 8:14 PM, "Trevor Norris" notifications@github.com wrote:
-1 for global, and I don't like the idea of a constructor. For quite a while I've wanted to simply change node's existing parser to match the WhatWG spec, and release it on a semver-major. Not because I give a crap about complying with the browser, but because this feature has enough edge cases where it could be confused to be the same based on only a sample of URLs. Path of least surprise and all. If that's a no go then I can live with require('url').URL.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/nodejs/node-eps/pull/28#issuecomment-223185047, or mute the thread https://github.com/notifications/unsubscribe/AAa2eTOQ3AO_5jAQt9lXctPXVg9oFoe1ks5qHkqZgaJpZM4IrtaC .
Yeah. I recede my point about not wanting a constructor. Since parse()
returns an object fundamentally things wouldn't change. I'm cool with that. Still no global though. And I do think that instead of deprecating .parse()
it would be better to just move to the same parser as URL()
.
@jasnell I am not sure if hard deprecation in v8 would be achievable.
Possibly (likely) not. As I said, it's a goal :). Realistically it would likely take longer unless we decided to simply flip the switch as @trevnorris suggests On Jun 1, 2016 8:34 PM, "Сковорода Никита Андреевич" < notifications@github.com> wrote:
@jasnell https://github.com/jasnell I am not sure if hard deprecation in v8 would be achievable.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/nodejs/node-eps/pull/28#issuecomment-223187207, or mute the thread https://github.com/notifications/unsubscribe/AAa2eWoXpHd8b25DQb0Xa5icIpjrmTwtks5qHk9UgaJpZM4IrtaC .
questions.
require('encodings')
? or internal only ?The querystring module would remain unchanged. URLSearchParams would be exposed only via the URL class. It's implementation could easily be backed by the querystring module, in fact (I've already done this actually).
No part of the encodings spec would be exported or visibly implemented. Essentially, in terms of new API surface, this would only export the URL and URLSearchParams objects.
I'm okay with non-global. It's likely lots of people would use a polyfill module anyway, which does the native-or-polyfill return type thing, for wider support.
I am a bit concerned about schema. According to spec, whatwg url handles the following schema,
ftp
http
https
file
gopher
ws
wss
But Node.js needs to handle other schema like git
...
I tried but I got the following results.
in chrome
> new URL('git://github.com/foo/bar');
host:""
hostname:""
href:"git://github.com/bar/buz"
origin:"git://"
pathname:"//github.com/bar/buz"
port:""
protocol:"git:"
in node
> url.parse('git://github.com/foo/bar');
protocol: 'git:',
host: 'github.com',
port: null,
hostname: 'github.com',
hash: null,
search: null,
query: null,
pathname: '/bar/buz',
path: '/bar/buz',
href: 'git://github.com/bar/buz'
FYI. I worked on same topic years ago (in typescript because of following WebIDL static interface) the goal was implement perfect whatwg fetch, includes perfect whatwg url.
if you will implement url, you requires thease.
most difficult thing is domain to asci and domain to unicode. http://www.unicode.org/reports/tr46/#ToASCII I gave up this point :(
Yeah, it will be necessary to expand the list of "special" schemes supported by the parser. This shouldn't be that difficult to do. On Jun 1, 2016 8:59 PM, "Yosuke Furukawa" notifications@github.com wrote:
I am a bit concerned about schema. According to spec https://url.spec.whatwg.org/#special-scheme, whatwg url handles the following schema,
- ftp
- http
- https
- file
- gopher
- ws
- wss
But Node.js needs to handle other schema like git...
I tried but I got the following results.
in chrome
new URL('git://github.com/foo/bar'); host:"" hostname:"" href:"git://github.com/bar/buz" origin:"git://" pathname:"//github.com/bar/buz" port:"" protocol:"git:"
in node
url.parse('git://github.com/foo/bar'); protocol: 'git:', host: 'github.com', port: null, hostname: 'github.com', hash: null, search: null, query: null, pathname: '/bar/buz', path: '/bar/buz', href: 'git://github.com/bar/buz'
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/nodejs/node-eps/pull/28#issuecomment-223189751, or mute the thread https://github.com/notifications/unsubscribe/AAa2ebePH8kw50R2HkqBrRDDD6ZF_ffVks5qHlULgaJpZM4IrtaC .
The WHATWG URL Standard specifies updated syntax, parsing and serialization of URLs as currently implemented by the main Web Browsers. The existing Node.js
url
module parsing and serialization implementation currently does not support the URL standard and fails to pass about 160 of the standard tests.This proposal is to implement the WHATWG URL Standard by introducing a new
URL
class off theurl
module (e.g.require('url').URL
).The existing
url
module would remain unchanged and there should be no backwards compatibility concerns.Example