Using faster url parser

petkaantonov commented 9 years ago

(Original node issue https://github.com/joyent/node/issues/6788)

I have rewritten the url parser module of node core as it was/is a serious bottleneck in some of the techempower benchmarks

Running node's urlparser benchmark using iojs it's still 16x faster when not retrieving properties and 11x faster when retrieving all properties (which are lazy getters in my implementation). In absolute terms the current iojs urlparser throughputs 25k parses per second vs 400k lazy/270k eager parses per second. format and resolve are also affected in similar magnitudes.

rvagg commented 9 years ago

a pull request or some other request for specific action is likely to get more attention here fwiw

petkaantonov commented 9 years ago

Well I request your opinion before I make it PR ready (it's a normal user module right now)

vkurchatkin commented 9 years ago

I like the idea of making url faster, but replacing the whole module frightens me a little

xaka commented 9 years ago

I think there is nothing wrong with using faster parser as it's just an implementation details as long as API stays the same and all existing tests pass

On Wed, Jan 28, 2015 at 1:39 PM, Vladimir Kurchatkin < notifications@github.com> wrote:

I like the idea of making url faster, but replacing the whole module frightens me a little

— Reply to this email directly or view it on GitHub https://github.com/iojs/io.js/issues/643#issuecomment-71921851.

runeli commented 9 years ago

I would love to see this on io

vkurchatkin commented 9 years ago

@xaka It's about subtle things, like data properties vs. accessor properties. It's hard to foresee everything once whole module changes. This requires major version bump at least.

xaka commented 9 years ago

That's a good point. Switching from properties to accessors means breaking API as things like console.log(...) would work differently and people might be using the stdout for instance for their own reasons, etc.

On Wed, Jan 28, 2015 at 2:36 PM, Vladimir Kurchatkin < notifications@github.com> wrote:

@xaka https://github.com/xaka It's about subtle things, like data properties vs. accessor properties. It's hard to foresee everything once whole module changes. This requires major version bump at least.

— Reply to this email directly or view it on GitHub https://github.com/iojs/io.js/issues/643#issuecomment-71931629.

jonathanong commented 9 years ago

this is one of the reasons why i'm trying to get rid of the util._extend() usage. otherwise, i fear too much would break.

mikeal commented 9 years ago

Can't we lazily patch the stringify and json transforms to avoid API change/break?

petkaantonov commented 9 years ago

The getters can be made enumerable (but they're still on the prototype) and the Url class can implement toJSON method. So that leaves util._extend(), which should be internal method?

It can also be considered to make the properties just eager data properties, it would still be 11x faster. But often many of the properties are not needed (especially .href?) so it could be a bummer.

rvagg commented 9 years ago

fixed at #1561

petkaantonov commented 9 years ago

Reopening since it was reverted :smile:

silverwind commented 9 years ago

Let's use this issue for further discussion on the new url implementation.

Previous discussions at #933, #1561, #1591 and #1602.

Fishrock123 commented 9 years ago

I think we could probably plan to get this into 3.0.0 with time to let the ecosystem know / start updating.

silverwind commented 9 years ago

My question would be how much of a perf difference would switching to plain properties over accessors have. If we can keep the API and get most of the perf, that'd be a win-win.

petkaantonov commented 9 years ago

perf is not related to accessors, but data property api is terrible and not like browser api at all :)

silverwind commented 9 years ago

I'd question how browserlike we want to go. Do we really want something like this?

> document.location.hash = null
> document.location.hash
"#null"

YurySolovyov commented 9 years ago

I assume the intention to follow browsers comes form wish of writing "isomorphic" apps and following standards in general (even though it gives you nonsense like #null) ?

Fishrock123 commented 9 years ago

I'd prefer to not have to run everything though toString(), which is probably what the browser does.

(I guess that's maybe not a bad thing?)

othiym23 commented 9 years ago

(copypasta'd from #1591)

This will probably find its way back to the TC at some point, but as I explain in npm/npm#8163, @isaacs and I don't think the right thing to do is to change npm to match the new url interface. I'm happy to discuss our reasoning, but ultimately @isaacs made the call, and I agreed, that this set of changes is sufficiently problematic to require a lot more discussion and careful planning to land.

petkaantonov commented 9 years ago

I think we could probably plan to get this into 3.0.0 with time to let the ecosystem know / start updating.

I will simply just improve the performance, if it wasn't for lazy .href it could even be done in a patch. But only making .href accessor should have no or minimal compatibility issues

petkaantonov commented 9 years ago

I'd question how browserlike we want to go. Do we really want something like this?
document.location.hash = null
document.location.hash
"#null"

The point is not this but accessors would have allowed all components of the url be automatically up to date when one is changed.

domenic commented 9 years ago

I agree it would be much better to match browsers (as @isaacs has always suggested; surprised to see the about-face) and to maintain a consistent state between the different properties.

If we can't manage to do that with require("url") then maybe we'll instead want to work on a standards-compliant URL global that does match browsers (including consistency guarantees). That would allow the more drastic changes necessary for compliance, e.g. the introduction of searchParams and so on. This might fall out naturally out of the work I'm doing in V8/Blink at my day job anyway (e.g. it might become just turning on a build flag in V8).

rlidwka commented 9 years ago

My suggestion would be:

In io.js 2.x use the same parser as before, but emit a warning on deletion. A few ideas:

set configurable: false so if people try to delete it, this operation will fail (strict mode only)
explicitly check if any properties are deleted in url.format (performance hit?)

In io.js 3.x drop support for that.

If npm won't change url interface until then, we should consider applying floating patch.

domenic commented 9 years ago

In case anyone didn't notice, 2.0.0 was released so anything backward-incompatible (or probably even annoying, like a warning) would have to be in 3.0.0.

rlidwka commented 9 years ago

Well there is nothing wrong with introducing a warning in a minor release.

isaacs commented 9 years ago

The intention is not to follow browsers for the good of writing isomorphic apps.

The intention is to follow url resolution logic like browsers, so that crawlers and other Node.js programs behave reasonably. The url object parsing is based on the browser location global and <a> html tag fields, but it also has to support being passed to http.request and friends. So, it's not completely identical, and needn't be.

A 100% isomorphic browser-style accessor-using auto-updating url object is a great idea. It belongs in npm, not in core.

spion commented 9 years ago

Will this make it into 4.0.0?

silverwind commented 9 years ago

That depends on @petkaantonov right now. See https://github.com/nodejs/io.js/pull/1650.

thefourtheye commented 9 years ago

The #1650 is continued at #2303 now.

jasnell commented 8 years ago

Refs: https://github.com/nodejs/node/pull/7448 ... a WHATWG URL Parser implementation. Bit slower than url.parse() currently but more correct to the standard.

jasnell commented 8 years ago

Closing given the lack of continued activity and discussion on this. Can reopen if necessary.

pkoretic commented 7 years ago

@jasnell @petkaantonov testing this using node 7.5.0 on 16.04 (dual core kvm Intel Xeon E3-12xx v2) still shows huge difference

we found about this after cpu profiling our application which showed url.parse takes most of the time and we can't avoid making requests

we have switched to fast-url-parser for now any info on this maybe?

node benchmark/nodecore.js

misc/url.js parse(): 40645.099
misc/url.js format(): 36836.169
misc/url.js resolve("../foo/bar?baz=boom"): 38703.878
misc/url.js resolve("foo/bar"): 39136.199
misc/url.js resolve("http://nodejs.org"): 37845.184
misc/url.js resolve("./foo/bar?baz"): 38815.977

node benchmark/urlparser.js

misc/url.js parse(): 184957.14
misc/url.js format(): 170904.26
misc/url.js resolve("../foo/bar?baz=boom"): 181324.04
misc/url.js resolve("foo/bar"): 169483.69
misc/url.js resolve("http://nodejs.org"): 180839.33
misc/url.js resolve("./foo/bar?baz"): 174857.71

nitrocode commented 6 years ago

I just hit an issue in production where extremely long referer urls (3000 chars, 5500 chars, and 9000 chars) were causing our node instances to spike CPU usage and we tracked it down to the url library. @pkoretic your fast-url-parser is much more performant.

Anyone have an update on when this will be merged? I'll switch to fast-url-parser for now.

Bessonov commented 4 years ago

Latest node.

$ node -v
v12.12.0
toxa@5fb84e50fca4:/tmp/urlparser$ node benchmark/nodecore.js 
misc/url.js parse(): 68502.876
misc/url.js format(): 60017.421
misc/url.js resolve("../foo/bar?baz=boom"): 59991.764
misc/url.js resolve("foo/bar"): 64876.780
misc/url.js resolve("http://nodejs.org"): 63232.093
misc/url.js resolve("./foo/bar?baz"): 64872.514
toxa@5fb84e50fca4:/tmp/urlparser$ node benchmark/urlparser.js 
misc/url.js parse(): 170138.65
misc/url.js format(): 191450.64
misc/url.js resolve("../foo/bar?baz=boom"): 208052.98
misc/url.js resolve("foo/bar"): 271166.70
misc/url.js resolve("http://nodejs.org"): 280692.39
misc/url.js resolve("./foo/bar?baz"): 255188.00

Bessonov commented 2 years ago

Still 2.5x:

$ node -v
v16.14.0

$ node benchmark/nodecore.js
misc/url.js parse(): 78603.333
misc/url.js format(): 72242.340
misc/url.js resolve("../foo/bar?baz=boom"): 71972.167
misc/url.js resolve("foo/bar"): 78404.141
misc/url.js resolve("http://nodejs.org"): 78088.377
misc/url.js resolve("./foo/bar?baz"): 78914.819

$ node benchmark/urlparser.js
misc/url.js parse(): 169323.52
misc/url.js format(): 199909.74
misc/url.js resolve("../foo/bar?baz=boom"): 198722.90
misc/url.js resolve("foo/bar"): 226340.86
misc/url.js resolve("http://nodejs.org"): 159880.62
misc/url.js resolve("./foo/bar?baz"): 284006.64

nodejs / node

Using faster url parser #643