lukeed / trouter

:fish: A fast, small-but-mighty, familiar fish...errr, router*
MIT License
634 stars 23 forks source link

It seems not efficient #12

Closed sheerun closed 4 years ago

sheerun commented 4 years ago

I just found this router and it seems quite nice, thank you :)

Just letting you know that while this router is nice and small it isn't very efficient. An efficient implementation would use something like Radix Tree to reduce find time to O(log n) instead of O(n)

sheerun commented 4 years ago

Here's one implementation you could use: https://github.com/charlieduong94/radix-router

lukeed commented 4 years ago

Hey, thanks!

I had a trie implementation while working on the 3.0 While technically true, it's actually not faster when you try to account for the implementation order, which was the point of 3.0 (and upcoming Polka 1.0). Put differently, the new major is meant to behave like Express applications, wherein the order in which you define your routes matters.

The trie/radix is ideal for finding a particular route – yes, but it can't efficiently "collect" handlers along the way. You'd have to assemble a list of matched handlers, and then sort them based on an index value (defined position) once all matches are known. At best, this becomes O(log n) + O(n)

It'd be quicker to find/collect the matched handlers (assuming many possibilities), but then the rectifying/ordering stage becomes super expensive (also assuming many matches).

There are two inflection points here, and I concluded that optimizing for massive applications (which would only save < 20ms anyway) wasn't worth the added complexity. Average/small-sized applications wouldn't (and didn't) see any benefit.

PS, large applications should be broken up into sub-applications & attached via use() anyway, which negates the need to traverse a large Array.

Closing, but that doesn't mean the discussion needs to end 😄

Thanks!

sheerun commented 4 years ago

Actually for ordered routes worst case would be O(n*log n) + O(n) if all routes match given url. On average it would still be O(log n) + O(1).

Maybe consider at least prefix-only filtering on radix trees. This way you could quickly filter all potential routes based on prefix (apps) and only then iterate over them with usual method.

sheerun commented 4 years ago

(in the comment above by "prefix" I mean anything in route pattern before first capture or wildcard.

Any manually added regexps could be checked every time

lukeed commented 4 years ago

Sorting is most definitely not O(1). It's always at least a O(n logn), but with a new n count.

Here's an example application that is (as it turns out) very common and inefficient with the radix structure:

router()
  .use(foo, bar)
  .get('/users', get1, get2)
  .use('/users', use1)
  .post('/users', post1)
  .use('/users/:id', use2, use3)
  .get('/users/:title', get3)
  .get('/', root)
  .use(baz, bat)
// Expected:

"GET /"
//=> [foo, bar, root, baz, bat]

"GET /users"
//=> [foo, bar, get1, get2, use1, baz, bat]

"GET /users/foo"
//=> [foo, bar, use1, use2, use3, get3, baz, bat]

"POST /users"
//=> [foo, bar, use1, post1, baz, bat]

This also doesn't account for mix-n-match of RegExp route patterns. You can't (easily) keep them off to one side, manually test them, and reinsert their handlers where they should have been.

All in all, it's a lot more complexity for very little gain on 90% of applications

sheerun commented 4 years ago

Sorting on O(1) size set is O(1) (O(1) because that many routes match after prefix filtering)

For example you add 3x more routes e.g. with /posts and /comments then average sorting time at the end decreases to 1/3 etc. Also you don't need sorting at runtime if Radix Tree leafs contain all possible matching routes with regexps inlined in proper order (radix tree is just indexing structure, regexps can be duplicated in leafs). The algorithm is pretty simple:

  1. Filter routes based on prefix using Radix Tree or simple prefix-buckets
  2. Use old algorithm to iterate over already sorted list of routes

Express routes is one of the slowest in benchmarks, and radix-tree routers are fastest (koa-tree-router, trek-router, find-my-way. I guess it's not priority for you but please reconsider in the future.

sheerun commented 4 years ago

https://github.com/delvedor/router-benchmark

sheerun commented 4 years ago

Somebody already measured trouter and indeed it seems it's on the slower side of benchmark: https://github.com/delvedor/router-benchmark/pull/14/files

lukeed commented 4 years ago

Yes, comparing apples to apples, trouter is a slower implementation. Of course. But it's the fastest implementation that behaves like Express routing.

I've built trouter as a radix tree while maintaining Express routing. It was 2-3x code size and only faster on apps with more than 50 routes.

Radix trees are perfect for (a) locating a destination asap; and (b) collecting breadcrumbs en route to that destination.

If (b) was synonymous with Express-style routing, I would of course be using a trie. But it isn't.

It's not O(1) because we're not picking/targeting just 1 handler. Again, we have to re-sort the matched (collected) handlerS to maintain the order in which they were defined.

These are NOT the same:

router()
  .use("/users", foo)
  .get("/users", bar);

router()
  .get("/users", bar)
  .use("/users", foo);

For a long time I thought they were - but they aren't. Had they been then the radix route would be perfect.

sheerun commented 4 years ago

We don't need to resort them because radix leafs can contain already sorted entries at index time

lukeed commented 4 years ago

Yes, but they're sorted on/within the same node only. It can't handle either of these:

router()
  .get("/users/:id", ...)
  .use("/users", ...);

router()
  .use("/users", ...)
  .use(...)
  .get("/users", ...)
sheerun commented 4 years ago

Here's an example of routes and already sorted 2-level radix index that shows it can handle these:

router()
  .use("/posts", postsMiddleware)
  .use("/users", usersMiddleware)
  .use(globalMiddleware)
  .get("/users", usersHandler)
  .get("/users/:id", userHandler)
  .get("/posts", postsHandler)

{
  "GET": {
    "/users": [
      ["use", "", usersMiddleware]
      ["use", "", globalMiddleware]
      ["end", "", usersHandler]
      ["end", "/:id", userHandler]
    ],
    "/posts": [
      ["use", "", postsMiddleware]
      ["use", "", globalMiddleware]
      ["end", "", postsHandler]
    ]
  }
}

The leafs are arrays in (example) format [type, prefix, handler] where type can be either use (middleware) or end (route). As you can see routing algorithm can first select leaf array and then just iterate on it. No sorting is necessary.

lukeed commented 4 years ago

Ok, so I had this en route to my radix solution. The problem, as is, is that

1) You dramatically increase the memory consumption because every handler is duplicated many times. This is easily solved by just saving a single handlers array and then using index references instead of the function directly.

2) As is, this isn't really a radix tree – it's really a partially truncated tree of routes. It certainly offloads some of the matching work, but in order to accommodate deeply-nested paths, you need an actual tree, with children.

```js
router()
  .use("/posts", postsMiddleware)
  .use("/users", usersMiddleware)
  .use(globalMiddleware)
  .get("/users", usersHandler)
  .get("/users/:id", userHandler)
  .get("/posts", postsHandler)
  // added
  .post("/users/:id/comments", addComment)
  .use("/users/:id/*", isOwner)
  .get("/users/:username/profile", showProfile)
  .put("/users/:username/profile", editProfile)
  .get("/users/:id/comments", listComments)
  .get("/users/:uid/comments/:cid", getComment)
  .put("/users/:uid/comments/:cid", editComment)

// current arch
// ~> with more routes & no nesting
{
  "GET": {
    "/users": [
      ["use", "", usersMiddleware],
      ["use", "", globalMiddleware],
      ["end", "", usersHandler],
      ["end", "/:id", userHandler],
      // added
      ["use", "/:id/*", isOwner],
      ["end", "/:username/profile", showProfile],
      ["end", "/:id/comments", listComments],
      ["end", "/:uid/comments/:cid", getComment],
    ],
    "/posts": [
      ["use", "", postsMiddleware],
      ["use", "", globalMiddleware],
      ["end", "", postsHandler]
    ]
  },
  // added
  "POST": {
    "/users": [
      ["use", "", usersMiddleware],
      ["use", "", globalMiddleware],
      ["end", "/:id/comments", addComment],
      ["use", "/users/:id/*", isOwner],
  },
  "PUT": {
    "/users": [
      ["use", "", usersMiddleware],
      ["use", "", globalMiddleware],
      ["use", "/users/:id/*", isOwner],
      ["end", "/users/:username/profile", editProfile],
      ["end", "/users/:uid/comments/:cid", editComment]
  }
}
```

For an actual radix solution, you'd want to be adding node tiers per URL segment so that, for example, you see that you'd have `/users/<...(static|param|wild|optional)>/` and run the same traversal recursively. 

3) This still doesn't allow RegExp routes to be interspersed throughout the app.

4) It doesn't account for the fact that RegExp executing is now much, much faster. The matchit benchmarks show that strict string comparison is faster, if that's all you have to do. But once you start splitting input strings and comparing them piece-by-piece, perf falls off quickly. Of course, that gets offset by the radix tree itself – but only if it's done an adequate job of narrowing down potential possibilities on insert. This is also why most trees operate on a character-by-character basis

sheerun commented 4 years ago

For an actual radix solution, you'd want to be adding node

In my example I've split it like this on purpose. If you overdo splitting then cost of matching prefixes is greated than cost of iterating over fast regexps (it cna be both memory and CPU).

The split of leaf should happen only if number of routes in leaf is high enough, e.g. 10. So your example of ~> with more routes & no nesting is actually correct tree because no leaf has more than 10 elements. If you added 2x more routes then correct algorithm would add another path element for more deeply nested radix tree to match.

This still doesn't allow RegExp routes to be interspersed throughout the app.

They can simply be present in each leaf

sheerun commented 4 years ago

If you mean that /:username/profile etc. have capture group at the beginning prefix radix tree cannnot be constructed further without more advanced algorithm then you're right, but I think such set of routes constitutes single app and doesn't need to be optimized further.

I think introducing at least simple form of this would be nice for performance and I believe you have solid understanding what I mean, so I'll just leave this decision for you. By the way thank you for discussion :)

lukeed commented 4 years ago

The discussion has been fun, thanks! 😄

I'll give this another crack – maybe I was just too hung up on certain details the first time(s) around. I'm a bit too busy to do this now, but I'll definitely add it to my plate before marking Polka as 1.0. Will ping you once this happens.


Yes, special patterns can happen anywhere and you'd still want the tree to be able to react predictably/performant-ly (lol) if you're going to go thru the trouble of setting this up at all.

Not really, re: RegExp patterns:

router()
  .get('/users', foo)
  .get(/^[/]users[/](?<id>\d+)[/]?$/, bar)
  .get('/users/:id/comments', baz)

{
  "GET": {
    "/users": [
       ["end", "", foo],
       // ???
       // ["end", /^[/]users[/](?<id>\d+)[/]?$/, needs_test_whole_path]
       ["end", "/:id/comments", baz]
    ],
 }

Quickly becomes unoptimized when you start throwing in a bunch of non-uniform conditions in your loop