syntax-tree / mdast-util-gfm-autolink-literal

mdast extension to parse and serialize GFM autolink literals
https://unifiedjs.com
MIT License
8 stars 6 forks source link

fix: limit match length of email regular expression #9

Closed llimllib closed 1 month ago

llimllib commented 1 month ago

Initial checklist

Description of changes

This is a quick fix for the problem described in #8, where a long line causes the findEmail regular expression to exhibit pathological behavior.

The solution presented here is to replace every + in the regular expression with {1,255}; it will still be super-linear on long lines but the buffer is now small enough that the function will complete in an acceptable period of time.

The maximum length of a valid email address is 255 according to the IETF here, but I've left room in the regex for 64 before and 255 after the @.

This only improves matters, doesn't fix the problem entirely.

Before this PR, line lenghts of up to about 50,000 cause recursion failure:

image

With this PR, I can run the regex on strings of length up to 10 megabytes, which seems like a very comfortable line length, certainly a big improvement:

image

but it still fails with a recursion limit above that

closes #8

github-actions[bot] commented 1 month ago

Hi! This was closed. Team: If this was merged, please describe when this is likely to be released. Otherwise, please add one of the no/* labels.