Open jimafisk opened 4 years ago
@jimafisk I have been benchmarking Build
and unsurprisingly there is one standout bottleneck. runtime.cgocall
from the v8go
calls into C.RunScript
etc... Some pprof output:
flat flat% sum% cum cum%
2.04s 48.92% 48.92% 2.04s 48.92% [plenorig]
1.37s 32.85% 81.77% 1.38s 33.09% runtime.cgocall
0.23s 5.52% 87.29% 0.27s 6.47% regexp.(*machine).add
0.17s 4.08% 91.37% 0.34s 8.15% regexp.(*machine).step
0.06s 1.44% 92.81% 0.06s 1.44% [libc-2.32.so]
0.06s 1.44% 94.24% 0.51s 12.23% regexp.(*machine).match
0.05s 1.20% 95.44% 0.05s 1.20% runtime.memmove
0.04s 0.96% 96.40% 0.04s 0.96% runtime.usleep
0.03s 0.72% 97.12% 0.03s 0.72% [libstdc++.so.6.0.28]
0.03s 0.72% 97.84% 0.03s 0.72% syscall.Syscall
0.01s 0.24% 96.84% 0.03s 0.73% runtime.mallocgc
0 0% 96.84% 2.35s 57.04% main.main
If you do a list plenti/cmd/build.compileSvelte
in pprof the most expensve call is ctx.RunScript("var { js, css } = svelte.compile(`"+componentStr+"`, {css: false, hydratable: true});", "compile_svelte")
.
There are some small general speed ups:
regexp.MustCompile
once as opposed to having them nested in functions repeatedly compiling each function call. compileSvelte
has a good few examples. Runscript
, maybe concatenating css.code
and js.code
using a unique delimiter then splitting in go
. I think by far the best long term approach is incremental builds #130 i.e only building exactly what is absolutely necessary.
Also seperating out concerns a bit maybe would speed things up and make local dev a nicer experience.
Currently Build
behaves pretty much the same across the board whether local dev with plenti serve
or plenti build
to generate production files.
If parts can be omitted for local vs prod etc.. then the logic/code for those should maybe be separated out and opt require opt in using flags/config.
Lastly, anywhere a CGo
call can be be avoided or at least not repeated it should be!
Yeah compiling Svelte components is definitely the slowest part, thanks for benchmarking to verify. I used to do this concurrently (https://github.com/plentico/plenti/commit/8c98790497724198d0bf5abd5e1f430b8dacaf40) back when I was executing node scripts. I tried it again later (https://github.com/plentico/plenti/commit/c2b7b2a252d954daba04ccc0b589593613ca887f) but had some issues so I removed it (https://github.com/plentico/plenti/commit/8a5693b89aeb45a6e880f496b0a538d6c0c53d0b). I think the performance enhancements you listed are spot on, I'll try to spend some time with that.
Incremental builds would be awesome, do you have a sense for how you would approach that? That would make the local dev experience a lot nicer, and I'm all in for anything that can speed up builds. Getting the full build faster will also be important so the feedback loop when running in CI is as quick as possible, but that's more of a prod issue like you said.
Good point on not repeating cgo calls. I was mainly focusing on just getting it working initially, but as you pointed out there are places where things are being run multiple times that can be optimized. Ultimately I'd love to avoid cgo altogether, but until someone builds a go-based svelte compiler it's probably something we'll just have to work with.
For incremental builds the simplest might be to store the last modified time or a hash of the content and rebuild anything that violated whichever/maybe both of those. Hashing is reasonably expensive but it could be done concurrently and due to the high cost of compiling it should still be worth it. Generally in production if you are generating static files you won't be changing a large percentage each build so it should be reasonably efficient? The trickiest part is keeping tabs on relationships, what needs to be rebuilt on content changing etc.
For local dev we get the actual file names in Watch
on change so once we have the name we should be able to just compile a very small subset of files. Again I still need to know what subset of files need rebuilding.
For anything to work we would need to break the build process out into smaller parts. I have started moving logic into separate functions here. There are a few optimizations also like compiling regex once and using sync.Once here to speed things up a little for local dev by creating some reused logic just once.
I am open to any ideas. You have a far better understanding of how everything works.
Yeah compiling Svelte components is definitely the slowest part, thanks for benchmarking to verify. I used to do this concurrently (8c98790) back when I was executing node scripts. I tried it again later (c2b7b2a) but had some issues so I removed it (8a5693b). I think the performance enhancements you listed are spot on, I'll try to spend some time with that.
Incremental builds would be awesome, do you have a sense for how you would approach that? That would make the local dev experience a lot nicer, and I'm all in for anything that can speed up builds. Getting the full build faster will also be important so the feedback loop when running in CI is as quick as possible, but that's more of a prod issue like you said.
Good point on not repeating cgo calls. I was mainly focusing on just getting it working initially, but as you pointed out there are places where things are being run multiple times that can be optimized. Ultimately I'd love to avoid cgo altogether, but until someone builds a go-based svelte compiler it's probably something we'll just have to work with.
Hey @jimafisk You are doing great work and I've been observing plenti for sometime. I'm using elderjs right now and had done some benchmarks recently for Hugo, plenti, zola, and eleventy. Plenti didn't scale well (understandable) and I think concurrency can help vastly. I just love svelte! So, I'm resisting myself from rebuilding everything with zola + Tera.
I don't have any experience in go or rust. Can I know what issues were you facing with concurrent steps that you mentioned here? Thank you.
Thanks @s-kris! Are you able to share your benchmarks? I'd love to see them, even if Plenti falls flat a bit :). Concurrency would definitely speed things up, although there are some challenges to doing that with the way we're compiling things in v8. Basically we're loading everything into one giant context, which is not goroutine safe: https://github.com/rogchap/v8go/issues/120
For your local dev, if you're on v0.4.13
+ you should be able to take advantage of in-memory builds and live-reload to speed things up a bit when making changes after the initial build: plenti serve -ML
I'd love to avoid cgo altogether, but until someone builds a go-based svelte compiler it's probably something we'll just have to work with.
That won't happen overnight, but it's something we're thinking about: https://youtu.be/Ql6cZJ-Udkg
Sure :) All are on latest versions. Benchmarks ran on MacBook air(2017) with 8gb ram.
Here are the benchmarks: (Note: Plenti threw error: out of memory
after 420 secs for 10k pages.)
Same graph without 10k pages data point:
Numbers:
That won't happen overnight, but it's something we're thinking about: https://youtu.be/Ql6cZJ-Udkg
I was reading about your brainstorm discussion on #130. Thanks for the youtube link :)
Edit:
Some more thoughts here https://twitter.com/_skris/status/1388259901080621056?s=20
This is super interesting, thanks for sharing @s-kris! I was surprised by the elder/hugo crossover for large sites, might be something for us to look into there. Our bottleneck is probably happening when we render out the HTML fallbacks in V8. Any chance you can share the repo for the benchmark tests so I can debug a bit? Thank you!
Thank you @jimafisk. Here's the repo https://github.com/Elderjs/elderjs
Sorry I was clear, do you still have the plenti repo with all the variable number of pages @s-kris? Thanks!
ah! It was nothing custom.
haha!
Tried building with different size sites (e.g. for 1,000 pages: tee about-{001..1000}.json < about.json >/dev/null
)
Number of pages | Build time |
---|---|
1,000 | 13.689803222s |
2,000 | 52.887924446s |
3,000 | 1m52.564471876s |
4,000 | 3m15.451195416s |
5,000 | Timeout (see error below) |
I've been playing with different build structures. Historically, I was operating under the assumption that v8 ctx was expensive, so we were creating one global SSRCtx with all SSR components loaded in, and then adding props and generating HTML from it for every page.
I've recently tried restructuring so each SSR component gets saved to the virtual filesystem (via afero) during the Client step, then for each content source, creating a new v8 context, loading in those components, then adding props and generating HTML, and once that specific page is complete, closing that specific context: https://github.com/plentico/plenti/commit/d1ba128a638a14e32eb1b35825b6759b94a641a1
As I expected, this seems to slow down the build in general (I don't have the exact numbers but I think generating 1,000 pages went from like 14 seconds to 43 seconds). However, it did allow me to build larger sites without throwing a timeout error. I was able to build over 5,000 pages without timing out, it just takes a very long time to complete:
I figured I could speed this up some by adding goroutines, and for small sites it seemed to work. One project I'm working on that usually takes 8 seconds with the old build structure, took 10 seconds with the new build structure + goroutines. However, large sites once again started having issues once goroutines were introduced: https://github.com/rogchap/v8go/issues/347#issuecomment-1566224004
I probably wouldn't trade slightly slower average size sites (low hundreds of pages) for the ability to complete large site builds (thousands of pages), especially if it takes 10 minutes to do so, which isn't practical for most folks. Maybe there is an option of using v8go for compiling the DOM and SSR components, but using Goja to actually render the template strings to HTML.
Building the client components concurrently (https://github.com/plentico/plenti/commit/f64a07929fd8ac076759ca820a90b44a36a8d24e) seems to have little effect on the overall build:
Given these results, I don't intend to convert the architecture in the short term unless we can figure out ways to make improvements.
I also tried doing a concurrent HTML build using goja, and although I never quite got it working (wasn't actually producing HTML), it increased the build time for a project I'm working on from about 8 seconds to 35 - 50 seconds. Not sure if Goja can be viable speed-wise unfortunately.
Build steps:
.js
files to build dir/content
/layout
build.js
script with NodeJSThese should be broken into goroutines so they run concurrently. If one step relies on a return value from another, we can use channels to block that thread until that data is available. This should speed up build times.