Experiment: render directly to string

matklad commented 1 year ago

This commit removes the "accumulate stuff into a buffer and joint" optimization, and instead does the dumbest thing possible of concatenated a tonne of separately allocated strings.

I wasn't able to run djot's own benchmarks

λ npm run bench

> djot@0.1.0 bench
> node ./bench/bench.js

node:internal/modules/cjs/loader:998
  throw err;
  ^

Error: Cannot find module '../lib/index.js'
Require stack:
- /home/matklad/p/djot.js/bench/bench.js
    at Module._resolveFilename (node:internal/modules/cjs/loader:995:15)
    at Module._load (node:internal/modules/cjs/loader:841:27)
    at Module.require (node:internal/modules/cjs/loader:1061:19)
    at require (node:internal/modules/cjs/helpers:103:18)
    at Object.<anonymous> (/home/matklad/p/djot.js/bench/bench.js:3:14)
    at Module._compile (node:internal/modules/cjs/loader:1159:14)
    at Module._extensions..js (node:internal/modules/cjs/loader:1213:10)
    at Module.load (node:internal/modules/cjs/loader:1037:32)
    at Module._load (node:internal/modules/cjs/loader:878:12)
    at Function.executeUserEntryPoint [as runMain] (node:internal/modules/run_main:81:12) {
  code: 'MODULE_NOT_FOUND',
  requireStack: [ '/home/matklad/p/djot.js/bench/bench.js' ]
}

Node.js v18.12.1

But my unscientific local benchmarking (rendering all posts from my blog) shows that this is actually substantially faster (finishis in 0.75 fraction of time)

jgm commented 1 year ago

Very cool. Not sure why bench didn't work. Did you do npm run build before npm run bench?

jgm commented 1 year ago

Here's what I get on a rather old machine:

Your branch:

block-list-flat.dj x 3,111 ops/sec ±1.59% (91 runs sampled)
block-bq-flat.dj x 19,091 ops/sec ±2.91% (86 runs sampled)
block-list-nested.dj x 4,287 ops/sec ±1.14% (89 runs sampled)
inline-escape.dj x 11,302 ops/sec ±0.78% (96 runs sampled)
block-bq-nested.dj x 12,578 ops/sec ±0.64% (92 runs sampled)
block-ref-flat.dj x 7,265 ops/sec ±0.64% (97 runs sampled)
block-code.dj x 27,712 ops/sec ±0.65% (90 runs sampled)
block-ref-nested.dj x 5,890 ops/sec ±0.44% (94 runs sampled)
inline-links-flat.dj x 6,761 ops/sec ±0.86% (92 runs sampled)
block-fences.dj x 26,843 ops/sec ±0.73% (93 runs sampled)
inline-autolink.dj x 11,351 ops/sec ±0.64% (95 runs sampled)
inline-links-nested.dj x 5,778 ops/sec ±0.90% (94 runs sampled)
block-heading.dj x 14,757 ops/sec ±0.98% (90 runs sampled)
inline-backticks.dj x 20,441 ops/sec ±1.08% (91 runs sampled)
block-hr.dj x 16,218 ops/sec ±2.70% (89 runs sampled)
inline-em-flat.dj x 10,492 ops/sec ±0.85% (87 runs sampled)
lorem1.dj x 3,029 ops/sec ±0.85% (92 runs sampled)
inline-em-nested.dj x 10,444 ops/sec ±1.29% (89 runs sampled)
inline-em-worst.dj x 11,778 ops/sec ±0.90% (93 runs sampled)
readme.dj x 558 ops/sec ±0.85% (92 runs sampled)

main:

block-list-flat.dj x 3,138 ops/sec ±1.28% (92 runs sampled)
block-bq-flat.dj x 18,178 ops/sec ±3.15% (82 runs sampled)
block-list-nested.dj x 3,903 ops/sec ±1.67% (93 runs sampled)
inline-escape.dj x 10,897 ops/sec ±0.64% (94 runs sampled)
block-bq-nested.dj x 11,579 ops/sec ±0.50% (94 runs sampled)
block-ref-flat.dj x 7,100 ops/sec ±1.05% (91 runs sampled)
block-code.dj x 27,877 ops/sec ±0.51% (89 runs sampled)
block-ref-nested.dj x 5,774 ops/sec ±0.67% (93 runs sampled)
inline-links-flat.dj x 6,714 ops/sec ±0.83% (95 runs sampled)
block-fences.dj x 26,840 ops/sec ±0.42% (97 runs sampled)
inline-autolink.dj x 11,201 ops/sec ±0.60% (96 runs sampled)
inline-links-nested.dj x 5,826 ops/sec ±0.49% (97 runs sampled)
block-heading.dj x 14,780 ops/sec ±0.72% (92 runs sampled)
inline-backticks.dj x 20,729 ops/sec ±0.50% (92 runs sampled)
block-hr.dj x 18,023 ops/sec ±0.60% (94 runs sampled)
inline-em-flat.dj x 10,772 ops/sec ±0.60% (94 runs sampled)
lorem1.dj x 3,105 ops/sec ±1.50% (93 runs sampled)
inline-em-nested.dj x 10,637 ops/sec ±0.58% (93 runs sampled)
inline-em-worst.dj x 10,130 ops/sec ±19.34% (81 runs sampled)
readme.dj x 512 ops/sec ±2.12% (86 runs sampled)

jgm commented 1 year ago

Anyway, I'm in favor of merging.

jgm commented 1 year ago

We might as well also replace the use of these array string buffers in src/parse.ts (l.32-46), fuzz.spec.ts (23-39), and djot-renderer.ts (98-613). But that can be a separate change.

matklad commented 1 year ago

Should be ready to get merged!

jgm commented 1 year ago

Apparently v8 uses ropes for strings, which gives fast concatenation: https://www.reddit.com/r/javascript/comments/a3qlpr/do_i_need_to_use_a_stringbuffer_in_2018_javascript/

Don't know if we can rely on this for non-v8 though.

jgm commented 1 year ago

Checked playground rendering in Safari before and after, and went 142ms to 133ms with a large input. That's promising.

jgm commented 1 year ago

by the way, I added a separate benchmark for just the renderer (and just test the parser on the others). You may need to npm install again to get the benchmark module. Then npm run bench render to benchmark the renderer.

matklad commented 1 year ago

Still no luck with benchmarking, havent looking into why it fails yet, using my blog as a benchmark so far (

jgm / djot.js

Experiment: render directly to string #14