Improve performance - Githubissues

jgm commented 5 years ago

See notes on performance in the README.md.

jgm commented 4 years ago

What I've tried

rewriting to operate directly on Text instead of tokenizing first
rewriting to operate directly on Text, using megaparsec instead of parsec, and using the fast parsers takeWhileP etc.
rewriting to use ByteStrings instead of Texts in the Toks.

None of this achieved any speed improvement over the current version using [Tok]; indeed, in every case performance was worse.

Profiling reveals that block structure parsing is fast. Most of the time is taken up by tokenize and restOfLine (31%), and by inline parsing.

Instructions for profiling

make prof

Current results (March 12 2020):

1.8      parseChunks
2.1      pDelimChunk
2.2      Commonmark.Blocks.runInlineParser
2.5      blockContinues
2.6      Commonmark.Inlines.processBs
2.9      MAIN
3.9      block_starts
6.6      renderHtml
9.0      pSymbol
11.9     defaultInlineParser
17.5     Commonmark.Tokens.tokenize
32.6     restOfLine

jgm commented 4 years ago

For a 1.4MB file:

jgm commented 4 years ago

Benchmarks for different extensions:

extension	mean
-xautolinks	310.8 ms (309.3 ms .. 311.3 ms)
-xpipe_tables	295.2 ms (293.2 ms .. 296.6 ms)
-xstrikethrough	267.9 ms (265.6 ms .. 269.1 ms)
-xsuperscript	267.8 ms (264.9 ms .. 269.5 ms)
-xsubscript	266.8 ms (263.6 ms .. 267.9 ms)
-xsmart	293.0 ms (292.0 ms .. 294.3 ms)
-xmath	287.4 ms (285.4 ms .. 290.7 ms)
-xemoji	281.6 ms (280.3 ms .. 282.8 ms)
-xfootnotes	291.3 ms (286.1 ms .. 293.3 ms)
-xdefinition_lists	272.6 ms (271.0 ms .. 275.4 ms)
-xfancy_lists	271.2 ms (269.3 ms .. 273.8 ms)
-xattributes	284.2 ms (283.4 ms .. 285.7 ms)
-xraw_attribute	280.7 ms (279.6 ms .. 281.6 ms)
-xbracketed_spans	268.5 ms (267.0 ms .. 269.4 ms)
-xfenced_divs	269.6 ms (267.5 ms .. 271.6 ms)
-xauto_identifiers	274.9 ms (273.0 ms .. 277.8 ms)
-ximplicit_heading_references	269.8 ms (268.2 ms .. 272.8 ms)
-xall	520.4 ms (515.5 ms .. 523.6 ms)

jgm commented 4 years ago

One idea to explore: use ShortText from text-short package instead of Text in Tok. The public API could still use Text. This should reduce the memory used by the tokens.

jgm / commonmark-hs

Improve performance #19

What I've tried

Instructions for profiling