quantizor / markdown-to-jsx

🏭 The most lightweight, customizable React markdown component.
https://markdown-to-jsx.quantizor.dev/
MIT License
1.96k stars 169 forks source link

Slow Regex HTML_BLOCK_ELEMENT_R because of issue with self closing tags #546

Closed devbrains-com closed 4 months ago

devbrains-com commented 6 months ago

We found out, the following regex is very slow and takes up to 50ms with a single self closing tag on the page.

const HTML_BLOCK_ELEMENT_R = /^ *(?!<[a-z][^ >/]* ?\/>)<([a-z][^ >/]*) ?([^>]*)\/{0}>\n?(\s*(?:<\1[^>]*?>[\s\S]*?<\/\1>|(?!<\1)[\s\S])*?)<\/\1>\n*/i

The reason for that seems to be a non working check for self closing tags \/{0}.

The final regex would be:

const HTML_BLOCK_ELEMENT_R = /^ *(?!<[a-z][^ >/]* ?\/>)<([a-z][^ >/]*) ?((?:[^>]*[^/])?)>\n?(\s*(?:<\1[^>]*?>[\s\S]*?<\/\1>|(?!<\1)[\s\S])*?)<\/\1>\n*/i

Thank you very much

quantizor commented 5 months ago

Could you check again with the latest code? There was a sorting issue in the rules that might have contributed to this problem.

I did perf test this particular change and was getting inconclusive results https://jsperf.app/joribi/1/preview

devbrains-com commented 5 months ago

Thank you very much for looking into it. It seems like the sorting fixes our main concern.

The regex's performance improvement was only visible in large examples with a lot of text after the self-closing element. I tested again, and I couldn't see any performance difference now.

Goues commented 5 months ago

Hey, here is a repro https://regex101.com/r/ac4mJP/1

Apparently, self closing tags cause a runaway regex and it just times out eventually if there is enough content. The fix proposed in here does fix this issue. Would you care to reopen the issue?

quantizor commented 4 months ago

@Goues if you run the adjusted regex against the unit tests it bails too early, but it is a lot faster. Working on finding a happy medium.

Worth noting that the OP regex is not current (there's no \/{0} sequence anymore). The current block HTML regex is:

/^ *(?!<[a-z][^ >/]* ?\/>)<([a-z][^ >/]*) ?([^>]*)>\n?(\s*(?:<\1[^>]*?>[\s\S]*?<\/\1>|(?!<\1\b)[\s\S])*?)<\/\1>(?!<\/\1>)\n*/i
quantizor commented 4 months ago

Ok I found a variation that works better

/^ *(?!<[a-z][^ >/]* ?\/>)<([a-z][^ >/]*) ?((?:[^>]*[^/])?)>\n?(\s*(?:<\1[^>]*?>[\s\S]*?<\/\1>|(?!<\1\b)[\s\S])*?)<\/\1>(?!<\/\1>)\n*/i

Thanks all, will get this into v7