routing evolution - Githubissues

Geal commented 5 years ago

The current trie based router is useful and reasonably fast, but is not flexible enough. Let's think about where we can go from here, what new features we want, etc.

current solution

we have two tries, one to match a reversed domain (so that we can build a suffix tree) to a Rc<Certificate> (because a certificate can match multiple domains, wildcards, etc) for SNI, and one to match a reversed domain to an application ID (an application can match multiple domains, wildcards, and we can also match on a prefix of the URI). We have different tries because there is not a 1<->1 match between certificates and applications, and they can be modified separately.

Those tries have been optimized a bit to branch on specific characters (various implementations are available at https://github.com/sozu-proxy/trie_benches ) and incremental updates (add/remove, coalescing nodes, etc).

current issues

it matches on the domain first

In the case of ACME support, we need to add an application matching the /.well-known/acme-challenge path for every domain. With the current sozu routing, we need to add a new frontend for each domain.

only way to match are on domain, domain suffix or path prefix

We might want more flexible matching options, like subdomain element matching a regex, or matching on the presence or value of a Header, source IP, etc.

one trie per `TcpListener`

so if we listen on multiple HTTP ports, or multiple HTTPS ports, we have one trie per port. It's not necessarily a big issue, since often we'll only have port 80 and 443, and they do not always have the same routes.

Additional needs

blacklists/whitelists

The load balancer is in a good position to filter requests, along with routing. It could prevent requests to unknown routes from going to the backend server.

proposals

I am not decided yet on how to proceed, so let's propose anything that could be interesting there.

pre and post list of rules

Having a list of rules that are very flexible, that can be tried before or after the trie is tested.

representing the router as a finite state machine

The fst crate got some interesting results, maybe there's something we could reuse there? https://blog.burntsushi.net/transducers/

nbraud commented 5 years ago

For the record, @Geal and I talked a bit out-of-band, about possible optimizations:

Use cranelift to compile routing rules to (specialized) machine code. This wouldn't supplant the trie vs. FSM discussion, as the code would be implementing essentially a datastructure, and FSMs and tries are suited to different optimizations.
Measure the rate of queries hitting each route in order to optimize the tree's structure for that. This is essentially a profile-based optimization, albeit one that can adapt for dynamic loads (as opposed to traditional, compile-time PBO). Basically, the idea is to minimize the average number of comparisons performed to match an incoming query, given the query distribution (the rate at which each rule is matched)

(2) is pretty exciting, because it means that additional, low-volume hostnames create very little overhead. For instance if a Sozu instance has hosts patate.com, patate.io, patate.sozu.io, patatate.sozu.io, patatatate.sozu.io, patatatatate.sozu.io, ... , with the vast majority of its traffic going to the pa(ta){n}te.sozu.io (patate with ta repeated N times), we would do:

at least 3+N comparisons with the current trie datastructure;
3 comparisons with a matching tree that starts by comparing .io, then sozu, then pa(ta){N}te.

It's a pretty pathological example, but I made it up to showcase how bad slowdown in tries can be. :3

nbraud commented 5 years ago

@Geal I think it would make sense to discuss first what's the expressive power that the configuration language needs to support.

For instance, if you want regexps as a first-class citizen (rather than as “after trie” rules), compiling to FSMs will likely be the happy path. On the other hand, tries support some useful optimizations that might not be so easy with FSMs, like “vectorized” matching (i.e. represent a string prefix as a u64, compare up to 8 characters at a time).

Same for “wildcard domain with a path constraint”: it's doable in both FSM and tries, but there's some hairy logic involved and it might be expensive (imagine we check /.well-known/acme-challenges first thing on every single query). OTOH, we might be able to optimize the overhead away if we know that path is very seldomly taken?

nbraud commented 5 years ago

@Geal OK, now I got nerdsniped into reading a whole bunch of automata theory papers to see if people have done multi-character, sublinear matching. (i.e. can we have all the optimizations that work on tries, on FSMs?)

Geal commented 5 years ago

I'm not fixed on the expressive power we want here, there will be a tradeoff between how flexible it can be, the speed it can run at (I'm not too worried about memory usage, considering how many applications we're currently handling at Clever Cloud), and the cost of modifying the data structure or code: workers are single threaded, so any time spent changing the structure is time spent not handling request, and the routing structure is not something that I would keep in an Arc<Mutex<>> because of synchronization costs.

@nbraud I'm quite interested in those papers as well, if you see something interesting, could you post it here?

nbraud commented 5 years ago

@Geal I would have expected that the routing structure can be kept in an RwLock<RefCell<T>>, and the thread that computes the new structure can keep read-only access (concurrently to all threads handling queries) while computing the new one, and then go routing.write()?.replace(new_routing) or somesuch, so the service interruption is only the time to write a pointer (plus sync overhead)

Geal commented 5 years ago

I spent a bit of time to clean up the trie benchmarks and make it easier to compare them. As a side note to my original message: the routing trie matches on the domain, and returns as value a Vec of HttpFront. We then test path prefixes in this Vec one by one. This is generally fast enough because there are not that many path prefix rules to test. It could make sense for the new design to reuse this idea, ie having an optimized algorithm to reduce the search space, then apply some rules linearly. (the annoying part is when one of those rules is shared by a lot of applications, like ACME challenges)

Geal commented 5 years ago

another parameter on which we could play: I'd be ok with generating something in the master process, then sending it to the workers, if the generation process takes some time and we want to do it on each update. There will be another important detail with replacement solutions like this one: if we want to keep an intermediate step (example: we already evaluated the URI and host but are waiting for other headers), then the struct would need to be kept in a Rc or something like that, to make sure we're not replacing a structure that's still in use by one session.

nbraud commented 5 years ago

@Geal Would if make sense to have the trie (or fst) match on the domain and return a trie/fst that matches the path and returns an HttpFront? That might make things easier if/when adding regexp conditions.

Also, the promised braindump on possible optimizations and related work (caveat lector: it's 7.30am here, and I've been up since 2am):

Use suffix scanning à la Boyer-Moore to skip over parts of the input: Sublinear Matching With Finite Automata Using Reverse Suffix Scanning, Steven Kearns
Match multiple symbols at once (say, a u64 containing 8 packed u8s): A hybrid multiple-character transition finite-automaton for string matching engine
Compute a cache-oblivious layout for the trie/fst (or its representation once compiled to machine code) so as to maximise the cache efficiency when querying it; that's something that would be very much upstreamable into fst (or publishable as an auxiliary crate, I guess).

Not sure the linked papers are the best or most useful, but that's what I found when searching for the ideas I had in mind; feel free to link other things if you fear I will run out of reading material ;) (I had other papers opened, but it was more about bit-parallel evaluation of NFAs and I can't recall offhand why I thought that would be relevant.)

Geal commented 5 years ago

I'm interested in a way to represent a trie a bit like fst is doing. Right now there are allocations in each node, but fst has a compact representation in a byte stream, with offsets indicating jumps i the stream. It's slower to edit (it requires regenerating part or all of the stream) but it's very small and fast to go through.

Geal commented 5 years ago

I added an example with fst in https://github.com/sozu-proxy/trie_benches The result is that generating the state machine takes x4 more time than generating a trie, and we would have to generate a new fst every time we change the routing. I've seen it take 4 to 6 times longer than the trie when looking up existing domains, but it gets slightly faster when the domain is not in the list.

I was talking with Guillaume Logerot, who works in the same coworking space I'm in and knows a bit about text search and FSM. Here's what he suggested:

reencoding the domain name on 6 bits instead of 8 (by reducing to lowercase alphabetical characters, digits, hyphen and dot)
checking out a structure like Lucene segments, which will better support edition
setting up a state machine that we fill up with the following: <reversed domain><SEP><path prefix>. We would probably be able to express more interesting matches like alternatives or globs, and that would put path matching in the same routing structure

Geal commented 5 years ago

I have spent some time testing various solutions, the results are available in the readme of https://github.com/sozu-proxy/trie_benches

To sum up:

the state machine or regex set solutions are too slow, especially for structure edition (since we need to rebuild from scratch every time)
I have not seen performance improvements by adapting the language on 6 bits
I experimented with a very specialized trie implementation that builds on the optimized trie, adding regexps and wildcard matching in the mix. Its performance is not bad, but it is very complex and will be a nightmare to debug

There's one solution that emerged as simple enough to maintain, fast enough and flexible enough, but it's an ugly idea compared to the others 😄 We separate the domain in labels (www.example.com -> [com, example, www]) and build a tree of hashmaps: each node contains a hashmap of the next level's labels. Regex and wildcard matching can be done easily at the same level: if the hashmap does not have the entry, test regexps or wildcard if they're available (no backtracking like the specialized trie solution). URL matching can be done in a separate way:

a default case that matches all URLs
a prefix tree (simple, no regex or wildcard) and testing regex if the prefix tree fails

To accomodate other use cases, I also plan to add "pre" and "post" rules that could be tested one by one, and might have more flexible routing rules than what is allowed in the main routing structure.

This way, we will be able to handle common use cases (like the ACME protocol one), allow for sites with high traffic but few rules (linear rules in "pre"), handle the hosting use case (lots of domains, few URL matching per domain), and the microservices use case (few domains, lots of URL matches). That last one will take a small hit compared to matching the whole domaine at once due to the hashmap tree, but it's manageable.

Next steps

add a way to represent regular expressions in domains. In the tests, I added them this way: /cdn[0-9]+/.example.com. Restricting them to domain labels should be flexible enough
add a way to represent regular expressions in URL matching. Maybe prefixing it with ~ ? We could also modify the Frontend structure to have either a path prefix or a path regexp
add an option to activate SipHash if necessary in the tree of hashmaps
make a specific Router structure that will take hostname and URL as arguments and test "pre" rules, routing tree matching, and "post" rules
add an option to the Frontend structure to reject if matched: we might want to reject some requests directly at the load balancer instead forwarding them to the backend application

I am not decided yet on how the "pre" and "post" rules can apply, especially considering that matching on headers is tricky in sozu: we do not assume we will get the whole request in on go, and do not build an internal representation of the request. Instead, sozu tries to match the route as soon as it has the URL and host, and for the rst of the headers, keep as few information as possible, and just knows if the data should be transmitted directly to the backend or not.

Geal commented 5 years ago

pending work on #585 and #586

Geal commented 5 years ago

the routing rewrite is done, available in the routing and 0.12 branches.

Geal commented 3 years ago

this will go in the next release

sozu-proxy / sozu

routing evolution #571

current solution

current issues

it matches on the domain first

only way to match are on domain, domain suffix or path prefix

one trie per `TcpListener`

Additional needs

blacklists/whitelists

proposals

pre and post list of rules

representing the router as a finite state machine

Next steps

sozu-proxy / sozu

routing evolution #571

current solution

current issues

it matches on the domain first

only way to match are on domain, domain suffix or path prefix

one trie per TcpListener

Additional needs

blacklists/whitelists

proposals

pre and post list of rules

representing the router as a finite state machine

Next steps

one trie per `TcpListener`