Open robsimmons opened 7 months ago
Figured out how to get Rust profiling working, and 99% of the time for that benchmark is spent in the add_impl
function of src/util/edit_map.rs
- it seems like this loop is creating the same quadratic behavior as the repeated slice operations in the JavaScript implementation. Maybe the edit map concept can be modified to avoid this quadratic repeated-splices, but right now it seems like it's merely delaying the quadratic repeated-splices.
Definitely possible that there are performance improvements to be made in rust and edit maps too! Cool that you’re investigating! Note that here you are specifically also generating strings that relate to JSX and SWC. Might be that there are things happening there.
That was worth investigating. What's SWC?
If I comment out
result.push_str("\n<DummyComponent code={`\n");
and
result.push_str("`}/>\n\n");
and change markdown::ParseOptions::mdx()
to markdown::ParseOptions::default())
, then I'm literally just parsing a file of line-separated UUIDs. The performance is actually worse, though only by a constant factor:
What's SWC?
Speedy Web Compiler https://github.com/swc-project/swc It handles much of the JS/JSX parsing inside MDX
Thanks for the Speedy Web Response @ChristianMurphy. Yeah, this and the profiling result suggests that it's the edit maps generating the same repeated-splice behavior the JS implementation was dealing with.
The performance is actually worse, though only by a constant factor:
That makes sense. Because then it has to parse paragraphs that are 10k lines long. There could be many things, links, emphasis, escapes, in there. A JSX element itself is much simpler. The expression you pass is somewhat complex too, as it’s JavaScript, and thus means 2 parsers have to work together.
Bad news @wooorm - micromark/micromark#169, or something like it, is an issue here as well - I haven't yet grokked your edit maps, but either they don't solve the quadratic-complexity parsing problems or there's a separate performance bug in markdown-rs.
The constant factors are better, but the asymptotic complexity means that we're back to 60-second parse times on files that are just an order of magnitude bigger than the ones that caused 60-second parses in micromark-js.
For comparison, the "JS" lines on these graphs show micromark's performance using subtokenize 2.0.1, which picks up the fix in micromark/micromark#171.
Data collection sources
Run with
node main.mjs
andcargo run --release
package.json
main.mjs
Cargo.toml
src/main.rs