syntax-tree / mdast-util-to-hast

utility to transform mdast to hast
https://unifiedjs.com
MIT License
101 stars 42 forks source link

Add improved performance on tons of whitespace #62

Closed 42shadow42 closed 2 years ago

42shadow42 commented 2 years ago

Initial checklist

Description of changes

Altered method of clearing out extraneous whitespace to handle non-matches more effectively. Approaches the problem using a regex with no backtracking to remove trailing whitespace, reverses the string and repeats to remove leading (now trailing) whitespace again. reverses the string to restore the original orientation.

I know it seems a little unorthodox to reverse the string, but because of how regex engines operate a leading whitespace regex like "[ \t]*$" runs inefficiently with lots of non-matching whitespace.

42shadow42 commented 2 years ago

Fixes https://github.com/syntax-tree/mdast-util-to-hast/issues/61

wooorm commented 2 years ago

Just moving some of the discussion from https://github.com/wooorm/trim-lines/commit/51434f7864f909e0469883dd23add10162f9d5f2#commitcomment-76990903 and https://github.com/wooorm/trim-lines/commit/321dc64468940f6fb59bacc775f5e7e2a940a51e#commitcomment-76991050 back here.

I remembered that I actually had a tiny utility that does what is also done here, that had the same performance problem, so I decided that it was probably a good idea to fix that utility, and reuse it here.

I indeed thought that it would probably be really slow to revere strings in some cases. And it looked a bit off. That’s why I came up with a regex, splitting on line endings, and then trimming each line.

I tested, with your test, that this new regex-way works exactly as fast as your reverse-string-way. You’re right that it would be good to also test some whitespace around line endings!

42shadow42 commented 2 years ago

@wooorm This one is ready for rereview. It now just delegates the inefficiency to your trim-lines package.

github-actions[bot] commented 2 years ago

Hi! This was closed. Team: If this was merged, please describe when this is likely to be released. Otherwise, please add one of the no/* labels.

wooorm commented 2 years ago

Thanks, released in 12.1.2!