Dynamic routes: Can we optimize the route matching rather than pulling up all routes for an app?

iron-io / functions

IronFunctions - the serverless microservices platform by

https://iron.io

Apache License 2.0

3.18k stars 227 forks source link

Dynamic routes: Can we optimize the route matching rather than pulling up all routes for an app? #77

Open treeder opened 8 years ago

treeder commented 8 years ago

This is tricky because of this: https://github.com/iron-io/functions/issues/71

ie: https://github.com/iron-io/functions/pull/74/files#diff-e180f07f5c320b82004c2bb065eedef0R99

The problem is we'd like to query for a single route, but let's say the route is defined as:

/blogs/:blog_id/comments/:comment_id

And a request comes in for:

/blogs/123/comments/456

How do we query the database for the route metadata with only that information? Could query for /blogs/123/comments/456, which is the static default way. But that wouldn't match. We could try all variations, eg /blogs/123/comments/* and /blogs/123/*/456, and /blogs/123/*/*, and so on, but that wouldn't be very efficient.

To summarize, the problem here is:

Given a path and only a path, eg: /blogs/travisblog/comments/123 what should you query for? Not knowing what routes exist before hand.

jmank88 commented 7 years ago

I came up with a scheme for postgres that at least pushes all the work to the server, so all the routes don't come over the wire each time, though I'm still not sure how to index it efficiently.

If the routes are stored with parameter names removed (Example: /blogs/:blog_id/comments/:comment_id -> /blogs/:/comment/:), then fairly simple regex queries can be used by matching on either the whole word or the colon character at each path component. Example: /blogs/123/comments/456 -> ^/(blogs|:)/(123|:)/(comments|:)/(456|:)$

CREATE TABLE routes (
  route TEXT,
  raw_route TEXT
);

-- Register routes
INSERT INTO routes (route, raw_route) VALUES 
('/blogs/:blog_id/comments/:comment_id', '/blogs/:/comments/:'),
('/blogs/:blog_id/comments', '/blogs/:/comments'),
('/blogs/:blog_id','/blogs/:'),
('/blogs','/blogs');

-- Query for '/blogs/123/comments/456'
SELECT route FROM routes WHERE raw_route ~ '^/(blogs|:)/(123|:)/(comments|:)/(456|:)$'

In theory, the same general approach could translate to BoltDB as well - store by nameless route components in nested buckets, and then search for matches one component at a time.

jmank88 commented 7 years ago

I implemented a prefix-scanning solution for BoltDB: https://godoc.org/github.com/jmank88/nuts#hdr-Path_Prefix_Scans It is much more simple and elegant than the nested buckets approach, and the code (just two functions) can be run against the existing schema. I was able to test/bench up to 1,000,000 routes.

jan-g commented 7 years ago

If you're happy with an algorithm linear in length of path (however, this has as many DB round-trips) then for /blogs/123/comments/456 you can look up blogs/ (and failing that, :/). Then look for blogs/123 (or blogs/:). Then blogs/:/comments. Then blogs/:/comments/456 (and blogs/:/comments/:` - which returns the final route).

With careful caching of a next-key-set in the trie path you can probably halve the number of round trips that this requires. The cost is updating the trie on route creation and deletion. Other "optimisations" might include caching complete patterns over a particular part of a sub-trie and returning those when there are fewer than some threshold number - ie, short-circuiting the complete trie traversal.

Note, this approach favours early parts of the path being matched over later ones - however, I don't think that that's too high a price to pay in practice.