Concurrent build steps - Githubissues

jimafisk commented 4 years ago

Build steps:

Eject temporary files
Direct copy .js files to build dir
Collect data sources from /content
Collect svelte templates from /layout
Execute the build.js script with NodeJS
Run Gopack to convert files to support ESM
Delete core files that weren't ejected by the user

These should be broken into goroutines so they run concurrently. If one step relies on a return value from another, we can use channels to block that thread until that data is available. This should speed up build times.

padraicbc commented 3 years ago

@jimafisk I have been benchmarking Build and unsurprisingly there is one standout bottleneck. runtime.cgocall from the v8go calls into C.RunScript etc... Some pprof output:

 flat  flat%   sum%        cum   cum%
 2.04s 48.92% 48.92%      2.04s 48.92%  [plenorig]
 1.37s 32.85% 81.77%      1.38s 33.09%  runtime.cgocall
 0.23s  5.52% 87.29%      0.27s  6.47%  regexp.(*machine).add
 0.17s  4.08% 91.37%      0.34s  8.15%  regexp.(*machine).step
 0.06s  1.44% 92.81%      0.06s  1.44%  [libc-2.32.so]
 0.06s  1.44% 94.24%      0.51s 12.23%  regexp.(*machine).match
 0.05s  1.20% 95.44%      0.05s  1.20%  runtime.memmove
 0.04s  0.96% 96.40%      0.04s  0.96%  runtime.usleep
 0.03s  0.72% 97.12%      0.03s  0.72%  [libstdc++.so.6.0.28]
 0.03s  0.72% 97.84%      0.03s  0.72%  syscall.Syscall
 0.01s  0.24% 96.84%      0.03s  0.73%  runtime.mallocgc
     0     0% 96.84%      2.35s 57.04%  main.main

If you do a list plenti/cmd/build.compileSvelte in pprof the most expensve call is ctx.RunScript("var { js, css } = svelte.compile(`"+componentStr+"`, {css: false, hydratable: true});", "compile_svelte") .
There are some small general speed ups:
- Using go routines but there needs to be a lot of care making sure dependant steps finish in the correct order plus strange things can happen with CGo and go routines.
- Calling regexp.MustCompile once as opposed to having them nested in functions repeatedly compiling each function call. compileSvelte has a good few examples.
- Doing as much as possible in a single call to Runscript, maybe concatenating css.code and js.code using a unique delimiter then splitting in go.

I think by far the best long term approach is incremental builds #130 i.e only building exactly what is absolutely necessary. Also seperating out concerns a bit maybe would speed things up and make local dev a nicer experience. Currently Build behaves pretty much the same across the board whether local dev with plenti serve or plenti build to generate production files. If parts can be omitted for local vs prod etc.. then the logic/code for those should maybe be separated out and opt require opt in using flags/config.
Lastly, anywhere a CGo call can be be avoided or at least not repeated it should be!

jimafisk commented 3 years ago

Yeah compiling Svelte components is definitely the slowest part, thanks for benchmarking to verify. I used to do this concurrently (https://github.com/plentico/plenti/commit/8c98790497724198d0bf5abd5e1f430b8dacaf40) back when I was executing node scripts. I tried it again later (https://github.com/plentico/plenti/commit/c2b7b2a252d954daba04ccc0b589593613ca887f) but had some issues so I removed it (https://github.com/plentico/plenti/commit/8a5693b89aeb45a6e880f496b0a538d6c0c53d0b). I think the performance enhancements you listed are spot on, I'll try to spend some time with that.

Incremental builds would be awesome, do you have a sense for how you would approach that? That would make the local dev experience a lot nicer, and I'm all in for anything that can speed up builds. Getting the full build faster will also be important so the feedback loop when running in CI is as quick as possible, but that's more of a prod issue like you said.

Good point on not repeating cgo calls. I was mainly focusing on just getting it working initially, but as you pointed out there are places where things are being run multiple times that can be optimized. Ultimately I'd love to avoid cgo altogether, but until someone builds a go-based svelte compiler it's probably something we'll just have to work with.

padraicbc commented 3 years ago

For incremental builds the simplest might be to store the last modified time or a hash of the content and rebuild anything that violated whichever/maybe both of those. Hashing is reasonably expensive but it could be done concurrently and due to the high cost of compiling it should still be worth it. Generally in production if you are generating static files you won't be changing a large percentage each build so it should be reasonably efficient? The trickiest part is keeping tabs on relationships, what needs to be rebuilt on content changing etc.

For local dev we get the actual file names in Watch on change so once we have the name we should be able to just compile a very small subset of files. Again I still need to know what subset of files need rebuilding.

For anything to work we would need to break the build process out into smaller parts. I have started moving logic into separate functions here. There are a few optimizations also like compiling regex once and using sync.Once here to speed things up a little for local dev by creating some reused logic just once.

I am open to any ideas. You have a far better understanding of how everything works.

s-kris commented 3 years ago

Yeah compiling Svelte components is definitely the slowest part, thanks for benchmarking to verify. I used to do this concurrently (8c98790) back when I was executing node scripts. I tried it again later (c2b7b2a) but had some issues so I removed it (8a5693b). I think the performance enhancements you listed are spot on, I'll try to spend some time with that.

Incremental builds would be awesome, do you have a sense for how you would approach that? That would make the local dev experience a lot nicer, and I'm all in for anything that can speed up builds. Getting the full build faster will also be important so the feedback loop when running in CI is as quick as possible, but that's more of a prod issue like you said.

Good point on not repeating cgo calls. I was mainly focusing on just getting it working initially, but as you pointed out there are places where things are being run multiple times that can be optimized. Ultimately I'd love to avoid cgo altogether, but until someone builds a go-based svelte compiler it's probably something we'll just have to work with.

Hey @jimafisk You are doing great work and I've been observing plenti for sometime. I'm using elderjs right now and had done some benchmarks recently for Hugo, plenti, zola, and eleventy. Plenti didn't scale well (understandable) and I think concurrency can help vastly. I just love svelte! So, I'm resisting myself from rebuilding everything with zola + Tera.

I don't have any experience in go or rust. Can I know what issues were you facing with concurrent steps that you mentioned here? Thank you.

jimafisk commented 3 years ago

Thanks @s-kris! Are you able to share your benchmarks? I'd love to see them, even if Plenti falls flat a bit :). Concurrency would definitely speed things up, although there are some challenges to doing that with the way we're compiling things in v8. Basically we're loading everything into one giant context, which is not goroutine safe: https://github.com/rogchap/v8go/issues/120

For your local dev, if you're on v0.4.13+ you should be able to take advantage of in-memory builds and live-reload to speed things up a bit when making changes after the initial build: plenti serve -ML

I'd love to avoid cgo altogether, but until someone builds a go-based svelte compiler it's probably something we'll just have to work with.

That won't happen overnight, but it's something we're thinking about: https://youtu.be/Ql6cZJ-Udkg

s-kris commented 3 years ago

Sure :) All are on latest versions. Benchmarks ran on MacBook air(2017) with 8gb ram.

Here are the benchmarks: (Note: Plenti threw error: out of memory after 420 secs for 10k pages.)

_build time (sec)

Same graph without 10k pages data point:

_build time (sec) (1)

Numbers:

Capture

That won't happen overnight, but it's something we're thinking about: https://youtu.be/Ql6cZJ-Udkg

I was reading about your brainstorm discussion on #130. Thanks for the youtube link :)

Edit:

Some more thoughts here https://twitter.com/_skris/status/1388259901080621056?s=20

jimafisk commented 3 years ago

This is super interesting, thanks for sharing @s-kris! I was surprised by the elder/hugo crossover for large sites, might be something for us to look into there. Our bottleneck is probably happening when we render out the HTML fallbacks in V8. Any chance you can share the repo for the benchmark tests so I can debug a bit? Thank you!

s-kris commented 3 years ago

Thank you @jimafisk. Here's the repo https://github.com/Elderjs/elderjs

jimafisk commented 3 years ago

Sorry I was clear, do you still have the plenti repo with all the variable number of pages @s-kris? Thanks!

s-kris commented 3 years ago

ah! It was nothing custom.

I just setup a brand new plenti site.
I wrote a small script to write some sample content to 10,000 files

haha!

jimafisk commented 2 years ago

Tried building with different size sites (e.g. for 1,000 pages: tee about-{001..1000}.json < about.json >/dev/null)

Number of pages	Build time
1,000	13.689803222s
2,000	52.887924446s
3,000	1m52.564471876s
4,000	3m15.451195416s
5,000	Timeout (see error below)

Error message

``` <--- Last few GCs ---> [470406:0x17ec00000000] 297579 ms: Mark-sweep 1361.2 (1447.8) -> 1356.1 (1444.4) MB, 7.0 / 0.1 ms (average mu = 0.777, current mu = 0.821) allocation failure GC in old space requested [470406:0x17ec00000000] 297675 ms: Mark-sweep 1371.0 (1448.1) -> 1361.9 (1448.6) MB, 17.2 / 0.1 ms (average mu = 0.807, current mu = 0.821) allocation failure scavenge might not succeed <--- JS stacktrace ---> # # Fatal javascript OOM in Reached heap limit # SIGILL: illegal instruction PC=0x13cff29 m=7 sigcode=2 instruction bytes: 0xf 0xb 0x55 0x48 0x89 0xe5 0xe8 0xdc 0xb9 0x3c 0xff 0x66 0x2e 0xf 0x1f 0x84 goroutine 0 [idle]: runtime: unknown pc 0x13cff29 stack: frame={sp:0x7fd6e09adec8, fp:0x0} stack=[0x7fd6e01b3248,0x7fd6e09b2e48) 00007fd6e09addc8: 0000000000000001 00007fd6e09b2440 00007fd6e09addd8: 000000000213b605 00007fd6e09adec0 00007fd6e09adde8: 00000000013d057b 00007fd6e09b2440 00007fd6e09addf8: 0000003000000018 00007fd6e09aded0 00007fd6e09ade08: 00007fd6e09ade10 0000000000000000 00007fd6e09ade18: 000000000210dc7e 000000000213b605 00007fd6e09ade28: 0000000000000000 000000000000001a 00007fd6e09ade38: 000000000000001a 00007fd6e09b3700 00007fd6e09ade48: 0000000000000000 0000000000000192 00007fd6e09ade58: 0000000000000192 0000000000000000 00007fd6e09ade68: 0000000000000000 0000000000000000 00007fd6e09ade78: 0000000000000000 0000000000000000 00007fd6e09ade88: 0000000000000000 0000000000000000 00007fd6e09ade98: 0000000000000000 0000000000000000 00007fd6e09adea8: 0000000000000000 0000000000000000 00007fd6e09adeb8: 0000000000000000 00007fd6e09adee0 00007fd6e09adec8: <0000000000dbcc3e 000017ec00000000 00007fd6e09aded8: 000017ec00000000 00007fd6e09b2890 00007fd6e09adee8: 0000000000dbcf8f 00007fd6e09adfb8 00007fd6e09adef8: 00007fd6e09b2650 0000000000000000 00007fd6e09adf08: ffffffffdecade00 000000000001c3b8 00007fd6e09adf18: 000000000001ceec 0000000000000000 00007fd6e09adf28: 0000000000f7bb00 00000000003d0704 00007fd6e09adf38: 00000000003e8dac 000000000001b0a0 00007fd6e09adf48: 000000000004a000 00000000000074b8 00007fd6e09adf58: 000000000003edd8 00000000583d7000 00007fd6e09adf68: 0000000000000000 0000000000003a00 00007fd6e09adf78: 0000000000000000 0000000000000000 00007fd6e09adf88: 0000000000000000 000000000000004b 00007fd6e09adf98: 000000005a899000 000000005a899000 00007fd6e09adfa8: 0000000000002000 00000000005d7e50 00007fd6e09adfb8: ffffffffdecade01 00007fd6e09adf08 runtime: unknown pc 0x13cff29 stack: frame={sp:0x7fd6e09adec8, fp:0x0} stack=[0x7fd6e01b3248,0x7fd6e09b2e48) 00007fd6e09addc8: 0000000000000001 00007fd6e09b2440 00007fd6e09addd8: 000000000213b605 00007fd6e09adec0 00007fd6e09adde8: 00000000013d057b 00007fd6e09b2440 00007fd6e09addf8: 0000003000000018 00007fd6e09aded0 00007fd6e09ade08: 00007fd6e09ade10 0000000000000000 00007fd6e09ade18: 000000000210dc7e 000000000213b605 00007fd6e09ade28: 0000000000000000 000000000000001a 00007fd6e09ade38: 000000000000001a 00007fd6e09b3700 00007fd6e09ade48: 0000000000000000 0000000000000192 00007fd6e09ade58: 0000000000000192 0000000000000000 00007fd6e09ade68: 0000000000000000 0000000000000000 00007fd6e09ade78: 0000000000000000 0000000000000000 00007fd6e09ade88: 0000000000000000 0000000000000000 00007fd6e09ade98: 0000000000000000 0000000000000000 00007fd6e09adea8: 0000000000000000 0000000000000000 00007fd6e09adeb8: 0000000000000000 00007fd6e09adee0 00007fd6e09adec8: <0000000000dbcc3e 000017ec00000000 00007fd6e09aded8: 000017ec00000000 00007fd6e09b2890 00007fd6e09adee8: 0000000000dbcf8f 00007fd6e09adfb8 00007fd6e09adef8: 00007fd6e09b2650 0000000000000000 00007fd6e09adf08: ffffffffdecade00 000000000001c3b8 00007fd6e09adf18: 000000000001ceec 0000000000000000 00007fd6e09adf28: 0000000000f7bb00 00000000003d0704 00007fd6e09adf38: 00000000003e8dac 000000000001b0a0 00007fd6e09adf48: 000000000004a000 00000000000074b8 00007fd6e09adf58: 000000000003edd8 00000000583d7000 00007fd6e09adf68: 0000000000000000 0000000000003a00 00007fd6e09adf78: 0000000000000000 0000000000000000 00007fd6e09adf88: 0000000000000000 000000000000004b 00007fd6e09adf98: 000000005a899000 000000005a899000 00007fd6e09adfa8: 0000000000002000 00000000005d7e50 00007fd6e09adfb8: ffffffffdecade01 00007fd6e09adf08 goroutine 1 [syscall]: runtime.cgocall(0xdaa750, 0xc00099b1e8, 0x7fd6c439ff80) /usr/local/go/src/runtime/cgocall.go:154 +0x5b fp=0xc00099b1b8 sp=0xc00099b180 pc=0x7a713b rogchap.com/v8go._Cfunc_ValueToString(0x7fd6c439ff80, 0x0) _cgo_gotypes.go:1441 +0x45 fp=0xc00099b1e8 sp=0xc00099b1b8 pc=0x8e0e85 rogchap.com/v8go.(*Value).String.func1(0xc000060ec0, 0x0) /home/jimafisk/go/pkg/mod/rogchap.com/v8go@v0.5.1/value.go:218 +0x55 fp=0xc00099b220 sp=0xc00099b1e8 pc=0x8e88b5 rogchap.com/v8go.(*Value).String(0xc000060ec0, 0x0, 0x0) /home/jimafisk/go/pkg/mod/rogchap.com/v8go@v0.5.1/value.go:218 +0x48 fp=0xc00099b260 sp=0xc00099b220 pc=0x8e4b48 github.com/plentico/plenti/cmd/build.createHTML(0xc0009bf368, 0x5, 0xc0007f3f90, 0xa, 0xc0009bf5e0, 0x1c, 0xc00095f200, 0x2c2, 0xc0009bf36e, 0xf, ...) /home/jimafisk/Projects/jantcu/plentico/plenti/cmd/build/data_source.go:335 +0x178 fp=0xc00099b390 sp=0xc00099b260 pc=0xb1f258 github.com/plentico/plenti/cmd/build.DataSource(0xc000031d60, 0x6, 0xc000031d36, 0x6, 0x26d3dd8, 0x1, 0x0, 0x0, 0x0, 0xbb8, ...) /home/jimafisk/Projects/jantcu/plentico/plenti/cmd/build/data_source.go:141 +0x878 fp=0xc00099b7f8 sp=0xc00099b390 pc=0xb1c318 github.com/plentico/plenti/cmd.Build(0x0, 0x0) /home/jimafisk/Projects/jantcu/plentico/plenti/cmd/build.go:139 +0x5fd fp=0xc00099bb00 sp=0xc00099b7f8 pc=0xd9e39d github.com/plentico/plenti/cmd.glob..func1(0x26a5d60, 0x274af38, 0x0, 0x0) /home/jimafisk/Projects/jantcu/plentico/plenti/cmd/build.go:44 +0x25 fp=0xc00099bb20 sp=0xc00099bb00 pc=0xda3c65 github.com/plentico/plenti/cmd.glob..func4(0x26a5d60, 0x274af38, 0x0, 0x0) /home/jimafisk/Projects/jantcu/plentico/plenti/cmd/serve.go:75 +0xa3e fp=0xc00099bd78 sp=0xc00099bb20 pc=0xda53be github.com/spf13/cobra.(*Command).execute(0x26a5d60, 0x274af38, 0x0, 0x0, 0x26a5d60, 0x274af38) /home/jimafisk/go/pkg/mod/github.com/spf13/cobra@v1.1.3/command.go:856 +0x2c2 fp=0xc00099be38 sp=0xc00099bd78 pc=0xb83e22 github.com/spf13/cobra.(*Command).ExecuteC(0x26a5ae0, 0xc000000180, 0x200000003, 0xc000000180) /home/jimafisk/go/pkg/mod/github.com/spf13/cobra@v1.1.3/command.go:960 +0x375 fp=0xc00099bf18 sp=0xc00099be38 pc=0xb84b55 github.com/spf13/cobra.(*Command).Execute(...) /home/jimafisk/go/pkg/mod/github.com/spf13/cobra@v1.1.3/command.go:897 github.com/plentico/plenti/cmd.Execute() /home/jimafisk/Projects/jantcu/plentico/plenti/cmd/root.go:60 +0x31 fp=0xc00099bf78 sp=0xc00099bf18 pc=0xd9fbd1 main.main() /home/jimafisk/Projects/jantcu/plentico/plenti/main.go:7 +0x25 fp=0xc00099bf88 sp=0xc00099bf78 pc=0xda91e5 runtime.main() /usr/local/go/src/runtime/proc.go:225 +0x256 fp=0xc00099bfe0 sp=0xc00099bf88 pc=0x7dc756 runtime.goexit() /usr/local/go/src/runtime/asm_amd64.s:1371 +0x1 fp=0xc00099bfe8 sp=0xc00099bfe0 pc=0x810101 goroutine 6 [sleep]: time.Sleep(0x5f5e100) /usr/local/go/src/runtime/time.go:193 +0xd2 github.com/briandowns/spinner.(*Spinner).Start.func1(0xc000134630) /home/jimafisk/go/pkg/mod/github.com/briandowns/spinner@v1.12.0/spinner.go:319 +0x53 created by github.com/briandowns/spinner.(*Spinner).Start /home/jimafisk/go/pkg/mod/github.com/briandowns/spinner@v1.12.0/spinner.go:281 +0x87 rax 0x33 rbx 0x17ec00000000 rcx 0x0 rdx 0x7fd6e09b3700 rdi 0x7fd6e09ab710 rsi 0x0 rbp 0x7fd6e09adee0 rsp 0x7fd6e09adec8 r8 0x33 r9 0x33 r10 0x210dca8 r11 0x0 r12 0x17ec00008f50 r13 0x1 r14 0x7fd6e09b2440 r15 0x213b605 rip 0x13cff29 rflags 0x10202 cs 0x33 fs 0x0 gs 0x0 ```

jimafisk commented 1 year ago

I've been playing with different build structures. Historically, I was operating under the assumption that v8 ctx was expensive, so we were creating one global SSRCtx with all SSR components loaded in, and then adding props and generating HTML from it for every page.

I've recently tried restructuring so each SSR component gets saved to the virtual filesystem (via afero) during the Client step, then for each content source, creating a new v8 context, loading in those components, then adding props and generating HTML, and once that specific page is complete, closing that specific context: https://github.com/plentico/plenti/commit/d1ba128a638a14e32eb1b35825b6759b94a641a1

As I expected, this seems to slow down the build in general (I don't have the exact numbers but I think generating 1,000 pages went from like 14 seconds to 43 seconds). However, it did allow me to build larger sites without throwing a timeout error. I was able to build over 5,000 pages without timing out, it just takes a very long time to complete:

Creating data_source took 10m35.433924064s
Total build took 11m14.689474485s

I figured I could speed this up some by adding goroutines, and for small sites it seemed to work. One project I'm working on that usually takes 8 seconds with the old build structure, took 10 seconds with the new build structure + goroutines. However, large sites once again started having issues once goroutines were introduced: https://github.com/rogchap/v8go/issues/347#issuecomment-1566224004

I probably wouldn't trade slightly slower average size sites (low hundreds of pages) for the ability to complete large site builds (thousands of pages), especially if it takes 10 minutes to do so, which isn't practical for most folks. Maybe there is an option of using v8go for compiling the DOM and SSR components, but using Goja to actually render the template strings to HTML.

jimafisk commented 1 year ago

Building the client components concurrently (https://github.com/plentico/plenti/commit/f64a07929fd8ac076759ca820a90b44a36a8d24e) seems to have little effect on the overall build:

Project 1 results: Consecutive wins :heavy_check_mark: (5 to 1)

| Concurrent build (issue-26) | Consecutive Build (master) | |------------------------|--------------------------| | 9.137822741s | 8.120130183s | | 8.476566434s | 8.240791791s | | 9.014339352s | 9.008943863s | | 9.173432853s | 8.852069174s | 8.53087334s | 8.269849894s | | 8.542125053s | 8.812322216s |

Project 2 results: Consecutive wins :heavy_check_mark: (4 to 2)

| Concurrent build (issue-26) | Consecutive Build (master) | |------------------------|--------------------------| | 2.225579422s | 2.247196698s | | 2.253613138s | 2.38226407s | | 2.272823551s | 2.212533565s | | 2.343795652s | 2.243285895s | | 2.42717278s | 2.366094546s | | 2.410467855s | 2.235620616s |

Given these results, I don't intend to convert the architecture in the short term unless we can figure out ways to make improvements.

I also tried doing a concurrent HTML build using goja, and although I never quite got it working (wasn't actually producing HTML), it increased the build time for a project I'm working on from about 8 seconds to 35 - 50 seconds. Not sure if Goja can be viable speed-wise unfortunately.

plentico / plenti

Concurrent build steps #26