soasme / nim-markdown

A Beautiful Markdown Parser in the Nim World.
https://www.soasme.com/nim-markdown/
MIT License
152 stars 11 forks source link

Bad performance with larger documents #48

Open osmarks opened 3 years ago

osmarks commented 3 years ago

I didn't really want to just post this without also providing a good fix, but it's causing me problems, and I lack the understanding of the large amount of code in this necessary to do much to it.

It seems that this library parses and renders Markdown quite slowly, especially on larger documents. I noticed this while testing with ~10KB documents but have mostly tried testing it on ~100KB ones, for which rendering takes about 4 seconds, while a Python CommonMark implementation seems to manage 400ms and a JavaScript one ~50ms. This is proving problematic for an application I want to use this in, and while one fix might be to switch to cmark bindings, they don't provide options to customize parsing, so I would have to reimplement a lot of logic.

By switching most of the internals away from doubly linked lists, I was able to get a roughly 25% improvement (very rough, as I just ran time over it a few times with a big test file as input) but I think bigger changes would be needed to get a significant difference; I think it needs to avoid allocating as many objects (maybe use an ADT or something instead) and to handle chunks of unformatted text better. This also breaks quite a few of the test cases (mostly only spacing in lists as far as I can tell): https://gist.github.com/osmarks/c7d8db89896047368d6512f6284cd7c2

Here is a test input file (without any formatting) and profiling output from it:

out2.txt profile_results.txt

soasme commented 3 years ago

Performance is indeed a problem. I intend to solve it in 2021. ⌨️

ghost commented 3 years ago

@osmarks can you recheck performance after #50?

osmarks commented 3 years ago

This does seem to improve performance a bit on my test case, but it still takes 3 seconds to work for some reason. I'm just using cmark-gfm for now.

soasme commented 3 years ago

@osmarks Although there are still plenty to optimize, I'm satisfied with the current result (as of v0.8.4):

$ nim c -d:release src/markdown.nim && time ./src/markdown < /tmp/out2.txt
real    0m1.390s
user    0m0.156s
sys 0m0.013s

Nimprof profile result also shows a significant reduce on the number of instruction calls, 3277 from the profile_results.txt you provided v/s 323 on the one run against v0.8.4:

total executions of each stack trace:
Entry: 1/131 Calls: 13/323 = 4.02% [sum: 13; 13/323 = 4.02%]

LMK if the performance improvement works on your side. Meanwhile, I'll keep working on improving the performance as tracked in #51.

Cheers.