Open BobbieGoede opened 4 months ago
Thanks for the investigation.
It would be interesting to see an actual perf benchmark (without nuxt). The difference in creation shouldn't be that big. That said, Vue Router 3 had no path sorting like v4. It would also allow to have a flame graph but I think you are right with your guess of insertMatcher
π
Initially, I designed the router to accept a custom matcher implementation (e.g. a trie) that would be more performant but with fewer features than v3. I never got the time to actually work on it, though.
I also thought (but never implemented it) of a way to pre-compile the routes so the matcher can be created instantly. I think this could be an interesting feature, especially for Nuxt where this can be automated. Combined with the ability to pass a custom route matcher, this should work out pretty well!
I was overthinking how to create a perf comparisonπ .. created one here https://github.com/BobbieGoede/vue-router-perf-bench which tries to compare both router versions and the time it takes to create the router 100 times with X routes. Maybe it can be changed to make the comparison fairer, but I think it demonstrates the performance difference.
I got these results on my machine: (I also checked flamegraph and didn't see insertMatcher
for some reason)
version/routes | 11 | 110 | 550 | 1100 | 2200 |
---|---|---|---|---|---|
RouterV3 | 6.23ms | 22.64ms | 100.24ms | 197.53ms | 403.65ms |
RouterV4 | 9.95ms | 42.49ms | 652.33ms | 2616.43ms | 10295.62ms |
Custom matchers and/or pre-compilation definitely sound like good features to allow optimizing router/matcher creation. I'm not sure if I have the skills (and time) to add such functionality by myself but if I can help and contribute let me know π
I checked on https://lahmatiy.github.io/cpupro/ with the cpuprofile file and the insertMatcher is what takes the most time!
Feel free to give the topic some thought and try to raise a PR. No rush at all
I did some research into what it would take to make a custom matcher for the router, and looked into tries as well. A custom matcher using a prefix (or radix) trie could definitely improve performance, hopefully I have time to look into making a matcher that fully matches the functionality/features of the current matcher. I came across https://github.com/unjs/radix3, if I can't wrap it in a custom matcher at least I will be able to use it as reference.
Since route sorting seems to be the current performance bottleneck (and after seeing unjs/radix3
export method) I tried making a sort cache (#2136), I figured this would be easier to implement to improve the performance in the short term.
It definitely works in speeding up router creation, though not entirely sure if this implementation holds up for sorting accuracy. | version/routes | 11 | 110 | 550 | 1100 | 2200 |
---|---|---|---|---|---|---|
RouterV3 | 5.88ms | 21.90ms | 100.11ms | 200.38ms | 405.75ms | |
RouterV4 | 10.72ms | 40.35ms | 601.14ms | 2386.12ms | 9649.10ms | |
RouterV4Cached | 5.26ms | 32.52ms | 141.46ms | 275.60ms | 553.46ms |
While it does speed up router creation in general, it looks like matcher creation remains a general bottleneck (tokesToParser
mostly), and after creation it looks like (https://lahmatiy.github.io/cpupro/ is very useful π). I will look into expanding my performance comparison bench to also test how resolve
takes up some time as wellresolve
scales with the amount of routes.
Using radix3 for a custom matcher would be nice. Note it doesn't cover all features (it can't as a trie). It's really great to see that there is no performance issue anymore with the cache π . Thanks a lot for taking the time to do it. I haven't been able to check the PR code yet but hopefully I can give you some feedback soon.
tokensToParser
should be quite fast already. The difference you see might be because router v3 will cache path-to-regexp results while v4 doesn't cache the results of tokensToParser. When I tested (a while ago), it was faster than path-to-regexp and caching wasn't necessary, so I wouldn't spend too much time into that part π
tokensToParser
should be quite fast already. ... When I tested (a while ago), it was faster than path-to-regexp and caching wasn't necessary, so I wouldn't spend too much time into that part π
You're right, it's fast π . Maybe the regex creation could be cached but it's easy to get carried away when looking for optimizations. As long as we can get the performance similar to that of Router v3 I'm happy π.
Using radix3 for a custom matcher would be nice. Note it doesn't cover all features (it can't as a trie).
From the perspective of i18n a trie just sounds appealing since most if not all routes are prefixed with a language. I suppose a separate matcher per language could be used for Nuxt I18n to achieve something similar (quick access to prefixed routes) while keeping it functionally the same.
I've attempted a different approach to speeding up insertMatcher
in #2137.
It uses a binary search to determine the insertion point. It yields a slightly different sort order, but I think it will be functionally equivalent. It's quite possible that I've failed to consider some of the edge cases, but the existing unit tests all pass, so I'm optimistic it can be made to work. If nothing else it may help to identify some missing test cases.
@skirtles-code
Your approach is much better! This essentially makes use of matchers
already being sorted right?
In my tests it performs better than my caching approach, I guess this probably means my caching is more expensive than it should/could be. I still think it's interesting to explore caching but it would probably need to cover more of the router matcher creation (not just the sorting) for it to be worth caching after your optimization.
This essentially makes use of
matchers
already being sorted right?
Correct.
In my tests it performs better than my caching approach
Same for me, but my PR isn't currently handling all the edge cases correctly. Fixing that will likely slow it down a bit, which may account for some of the difference we're seeing.
I still think it's interesting to explore caching
Agreed. Caching should be able to improve various aspects of the performance, not just the insertion.
I've just opened another PR, #2148. This explores some other ideas. I've done a long write up on the PR, but roughly speaking it uses some of the ideas already discussed earlier in this thread (not including caching and binary search) to try to squeeze out extra performance.
In #2166, I've explored the idea of making the matcher pluggable. This allows a custom matcher to be passed when creating the router. More importantly, it allows most of the work of creating a matcher to be done upfront, rather than creating it afresh for each request.
This is essentially a version of the caching/pre-compilation idea that has already been discussed.
We also found vue-router could be a perf bottleneck when the routes amount is large. Will put an eye on this issue π cc @Mister-Hope
Is this topic still being worked on? Experienced performance issues within an multi-language application with about 1100 routes as well.
FYI I added this to the roadmap. The idea is to go with https://github.com/vuejs/router/pull/2166. That way, the whole creation of the routes could be removed at build time with unplugin-vue-router in the future
@posva I think #2137 (binary search for insertMatcher
) is also worth considering. I believe that fixes the current performance problems and (in my opinion) it's ready to be merged. It requires no changes to the public API and would also benefit those who can't use the pluggable matcher for whatever reason.
I just checked the impact on the bundle size and it adds about 78 bytes to vue-router.global.prod.js
, not accounting for compression.
You make a good point. I will try to review it a look next week
This will be partially fixed by #2137
βββββββββββββββ¬βββββββββββ¬ββββββββββββ¬βββββββββββββ¬ββββββββββββββ¬ββββββββββββββ
β (index) β 11 β 110 β 550 β 1100 β 2200 β
βββββββββββββββΌβββββββββββΌββββββββββββΌβββββββββββββΌββββββββββββββΌββββββββββββββ€
β RouterV3 β '4.05ms' β '17.61ms' β '73.11ms' β '147.40ms' β '317.14ms' β
β RouterV4 β '7.51ms' β '36.69ms' β '524.71ms' β '1862.11ms' β '7311.56ms' β
β RouterVEdge β '5.79ms' β '22.57ms' β '105.15ms' β '218.72ms' β '462.07ms' β
βββββββββββββββ΄βββββββββββ΄ββββββββββββ΄βββββββββββββ΄ββββββββββββββ΄ββββββββββββββ
But the idea is to still allow a pluggable matcher so I will keep this open. Thanks a lot @skirtles-code for the PR and @BobbieGoede for the benchmark!
I published a patch version. So let me know how it goes
Reproduction
https://stackblitz.com/edit/github-afjnf8?file=src%2Frouter.ts
Steps to reproduce the bug
ab
orsiege
and compare the results.siege -t 10s -c 100 "http://localhost:3000/"
to see the amount of requests within 10s using 100 concurrent users.dirCount
variable to adjust the amount of routes generated/used (default100
, which generates 1101 routes). and test again.Expected behavior
Expected the amount of routes to have an impact on performance comparable to router v3. The performance difference is noticeable for SSR as a new router instance is created on each request.
Actual behavior
Router creation doesn't scale as well as it did in v3 (presumably, I have no clean reproduction with v3 Router alone), please view the result comparison in this issue https://github.com/nuxt/nuxt/issues/25612 which compares Nuxt 2 and Nuxt 3.
If you have a clean/plain SSR project using router v3 I would be happy to include a reproduction with a performance comparison between v3 and v4, I tried making one but had no luck.
Additional information
While debugging performance issues for
@nuxtjs/i18n
narrowed it down to the difference in performance between Nuxt 2 and Nuxt 3, specifically the performance impact of having a high amount of routes. Related issue here: https://github.com/nuxt/nuxt/issues/25612Due to the way i18n routing works (same as in i18n for Nuxt 2), routes are generated/multiplied by the amount of configured languages. Of course I will look into alternative ways to make routing more efficient/performant when using i18n, but it's worth looking into why the performance of router creation has gotten worse and how we can improve it.
My first guess would be that
insertMatcher
doesn't scale very well, I couldn't find its equivalent in v3.