lukeed / trouter

:fish: A fast, small-but-mighty, familiar fish...errr, router*
MIT License
634 stars 23 forks source link

Route priority #15

Closed ElianCordoba closed 4 years ago

ElianCordoba commented 4 years ago

First of all, really cool project, congrats!

I took a look at the internals and I see that you loop over the array of routes when you try to find one and push to it every time a new route is added. I thought that we could pass some kind of priority when adding a route so instead of just pushing it to the end of the list we have some sort of ranking algorithm that add the route in the right index. Maybe the priority could be a number between 1 to a 100, with 50 as a default if you don't pass any. The impact could be meaningful if you have many routes that were added before the one that you are actually trying to hit, for example, I ran this quick benchmark on my pc, just doing as many requests as possible to importantRoute

router.get("/importantRoute", () => true);
// 14,651,602 ops/sec

And then:

router.get("/a", () => true);
router.get("/b", () => true);
router.get("/c", () => true);
router.get("/d", () => true);
router.get("/e", () => true);
router.get("/f", () => true);
router.get("/g", () => true);
router.get("/h", () => true);
router.get("/importantRoute", () => true);
// 8,846,612 ops/sec

Also, the price of this extra work would be paid only when the routes are being added, most of the time when the server is starting up so it does not really matter if it takes a few milliseconds longer :)

If you think it's a good idea I can take a shot at implementing it.

lukeed commented 4 years ago

Hey, thanks~!

I think I might need a little more explanation for what you mean, sorry 🙈

Initially, my concern reading this is that the order of the routes need to be respected. That's why a simple Array is used – the indices reflect the .add() order. This is to maintain Express-like sequencing and composition, which Trouter (and inherently, Polka) aim to maintain.

There was another discussion that touched on this over in #12. If there were to be a ranking algo taking place, it'd probably have to be based/stemmed by common path prefixes – and still account for use() blocks that are interspersed in between more specific routes.

Anyway, after #12 I spent a couple weekends rebuilding Trouter with various Trie/Radix implementations. What I came up with was (always) significantly more code and, more importantly, slower in the majority of cases. It only started to shine when you're dealing with 150+ unique route patterns, which I would argue is very uncommon for a Node.js application.

In any event, the difference between 8M and 14M is actually meaningless when it comes to a Node.js HTTP router. I would still like to reach for better solutions if they're out there, but objectively, it makes little/no difference.

Note: I'm using 8M and 14M just because they're what you listed here

That may sound controversial, but:

1) The Node.js (10.x, 12.x, 14.x) HTTP modules max out at 45-50k req/s with empty response handlers. Since these HTTP routers (trouter included) are typically the first actor within a response handler, you just need the router not to lag behind. So when comparing a router that can handle 8M req/s vs a HTTP engine that can handle 50k req/s, your engine is the ceiling, not the router.

2) The difference between 8M and 14M op/sec is not noticeable, especially in this context. The numbers are bigger and shinier, sure, but 8M hits in a second already means that each request is in sub-millisecond territory. Halving that is great, but it's just producing another sub-millisecond value. Passing your response through JSON.stringify is going to have a larger time cost 😅

3) Related to (2), but any kind of work within your response handler(s) affects your overall performance far more than any attempt at regaining 1.25e-10ms latency from the router.

Again, there might be an easy tweak we can do that "unlocks" bigger numbers (which I'd be up for if it were sensible, small, and easy to justify) but after chasing that rabbit for a little too long, the reality of 1-3 persuaded me to just let it go.

Looking forward to better understanding your suggestion, but that's the run-down of my history with this topic :)

ElianCordoba commented 4 years ago

Well first of all thanks for the super detailed answer! Let me clarify what I initially meant, basically this:

// Pseudocode of the function that adds a new route to the router
function add(newRoute) {
  let inserted;
  this.routes.map(route => {
    if (newRoute.priority >= route.priority) {
      this.routes.splice(index, 0, newRoute);
    }
  })

  if (!inserted) {
    this.routes.push(newRoute)
  }
}

So, given the following:

router.get("/uncommonRoute",   () => true, 25) // <- Priority
router.get("/veryCommonRoute", () => true, 75)
router.get("/uncommonRoute2",  () => true, 10)
router.get("/mostCommonRoute", () => true, 100)

The internal routes array would look like this:

[ mostCommonRoute, veryCommonRoute, uncommonRoute, uncommonRoute2 ]

With the way the router currently works in order to match the mostCommonRoute it would have to do 4 checks, instead, if we sort them by priority, it would find it on the first check. Best case scenario 😜

Now onto the 3 points that you made, I can´t I say anything besides that's disappointing 😕 Are we at the endgame of Node.js routing then? Wouldn´t the jump from 8M to 14M be more noticeable in less powerful servers? What about clustering/worker threads?

Thanks beforehand!

lukeed commented 4 years ago

Oh, I see. You should just add router.get("/mostCommonRoute") first if it's the most important. Similarly, you define the routes in the order that reflects desired inheritance and lookup. That's why user-specified order is respected.

If, for whatever reason, you want to modify thus order after adding, the routes key is available to you and you can splice() to your heart's content!


Those Node.js numbers are a typical Node.js server running on a single CPU core. The version of Node will affect your maximum throughput more than the age of the CPU, since basically anything within the last 5-8 years is plenty fast.

If you run a Node.js cluster, you can get better throughput, but it's not a linear scale. For example, my machine is 50k req/s on single core, and ~135k req/s with 4 cores allotted. Node.js is a runtime, which is the overhead loss. This exists in any runtime language and is why you'll see projects like uWebSockets.js and japronto (python) that ship C++ bindings for an alternate HTTP engine.

Wouldn´t the jump from 8M to 14M be more noticeable in less powerful servers?

I'm sure you know, but the less powerful server supposedly wouldn't see 8M or 14M anyway. Everything would scale down, which means that the HTTP engine scales down too. So you're probably looking at – let's say – 20k req/s? As you've seen in the Trouter code, we just have a single O(n) lookup. You'd have to have a considerable number of unique routes for Trouter.find to start becoming the bottleneck.

Even then, this is all for benchmark purposes. AKA, these numbers are only good for echo/ping-pong servers. In other words, if your request handler connects to a database, your "50k" drops considerably

ElianCordoba commented 4 years ago

Yep, adding them in order is definitely a solution but that way will force the developer to follow that rule and maybe for readability benefits you want to group the endpoints. For example adding all the user related endpoint (where only one is frequently used and after that all the customers related endpoint (where again, only one is commonly used). That would be easy to read but not the optimal internal structure. But I do understand your points, I guess I will have to get closer to the metal if I want to squeeze more performance 😊

Thank you very much for your time!

lukeed commented 4 years ago

This is where sub-routers/sub-applications come into play. It allows you to have a sort-order for groups and then within groups, and there's not much of a performance hit anyway - often ends up being faster if you have many routes.

You can think of them as creating a multi-dimensional routes array, except that Trouter will only go down a level if the group prefix matches. Effectively allows you to skip chunks of routes.

Thanks for the discussion!