yuin / goldmark

:trophy: A markdown parser written in Go. Easy to extend, standard(CommonMark) compliant, well structured.
MIT License
3.68k stars 255 forks source link

Performance issues when parsing specially crafted Markdown documents. #189

Closed peng-hui closed 2 years ago

peng-hui commented 3 years ago

goldmark has https://github.com/yuin/goldmark/discussions in github. You should post only issues here. Feature requests and questions should be posted at discussions.

Summary

I am a Hugo user. I found when Hugo parses some crafted Markdown document, it can lead to excessive CPU usage. I had reported the bug here (https://github.com/gohugoio/hugo/issues/8187) but was directed to here by their maintainers.

Please answer the following before submitting your issue:

  1. What version of goldmark are you using? : I am using Hugo. The latest Hugo uses ithub.com/yuin/goldmark v1.2.1. It is not the latest version. I don't have experience in Go thus I don't test the testest version now.

  2. What version of Go are you using? : go version go1.14.2 darwin/amd64

  3. What operating system and processor architecture are you using? : Reproduced on MacOS running Big Sur 11.1, and Linux pc 4.9.0-14-amd64 #1 SMP Debian 4.9.246-2 (2020-12-17) x86_64 GNU/Linux

  4. What did you do? : Just use Hugo/GoldMarkd to parse Markdown documents to HTML.

  5. What did you expect to see? The parsing uses much less time.

  6. What did you see instead? : Given the same size input, the program uses a much longer time.

  7. Did you confirm your output is different from CommonMark online demo or other official renderer correspond with an extension?: Yes. It is not related to the correctness problem, but a performance problem. It definitely obeys the CommonMark 0.29.0 specifications.

For details to reproduce, please refer to https://github.com/gohugoio/hugo/issues/8187.

Prepare special Markdown files and use the parser to parse. Examples are provided using Python code snippets. You can definitely change the number of repetitions.

"[" * 50000 + "]" * 50000
"[[" * 50000 + "xxxx"  + "]]" * 50000
"a <![CDATA[" * 50000
"a" + "<?" * 50000
"a <!A " * 50000

Sorry I haven't tested the latest version. Will do that when I have time.

stale[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

yuin commented 2 years ago

Fixed in v1.4.8 release