I'm currently working on a project which contains a large number (4k+) of Agda code blocks in markdown files. When debugging some performance issues we were having with our build process, I noticed that we were spending a large amount of time in Pandoc's HTML writer.
Using GHC's profiler, we can see that most of the time is spent in compiling and parsing Skylighting's regex.
In order to avoid repeated re-compiles of regexes for each call to tokenize, this patch (lazily) compiles the regex when constructing a RE value. This ensures the compiled regex is shared across all usages of the syntax.
This has a significant affect on performance, reducing the total build time from ~38s to ~24s. Another profile for comparison (though I'm a little dubious of the relative speedup in profiling builds):
There is some awkwardness here, as we now need to derive all the type classes manually. This is especially irritating for the Show/Read instances. I'm not sure there's a good alternative here.
I've tried to hide the hide the internals of this - we're using pattern synonyms to keeping the interface the same as before (RE { reString, reCaseSensitive }).
I'm currently working on a project which contains a large number (4k+) of Agda code blocks in markdown files. When debugging some performance issues we were having with our build process, I noticed that we were spending a large amount of time in Pandoc's HTML writer.
Using GHC's profiler, we can see that most of the time is spent in compiling and parsing Skylighting's regex.
In order to avoid repeated re-compiles of regexes for each call to
tokenize
, this patch (lazily) compiles the regex when constructing aRE
value. This ensures the compiled regex is shared across all usages of the syntax.This has a significant affect on performance, reducing the total build time from ~38s to ~24s. Another profile for comparison (though I'm a little dubious of the relative speedup in profiling builds):
There is some awkwardness here, as we now need to derive all the type classes manually. This is especially irritating for the
Show
/Read
instances. I'm not sure there's a good alternative here.I've tried to hide the hide the internals of this - we're using pattern synonyms to keeping the interface the same as before (
RE { reString, reCaseSensitive }
).