Open herndlm opened 7 months ago
In general, excessive backtracking is solved by fixing the regular expressions to avoid doing unnecessary things. Backtracking is prone to result in combinatory explosions where there's no reasonable limit that would work… at best you'll just hang for a few minutes/hours/days depending on the input.
But in this case I believe the problem is that it backtracks a fixed number of time per character but since you have about 720K characters it reaches the limit quite fast, despite being constant-time. So maybe increasing the limit would work fine in this case.
Ideally we'd eliminate this unnecessary backtracking. I played with the regex you put in the regex101 link and came up with this reasonable replacement:
^((?>.+))\n(=+|-+)[ ]*\n+
or this equivalent one:
^([^\n]+)\n(=+|-+)[ ]*\n+
Neither will cause backtracking. Either one can be used as a replacement. More work is needed to improve on the one in the MarkdownExtra parser which is a bit more complex.
It's likely others regular expressions could cause excessive backtracking like this. I don't test with 720K character lines so I don't know.
you're right, I could make it work by increasing pcre.backtrack_limit
and it seems to scale linear with the line length basically. your regexes here seem to work fine too, feel free to decide what to do. Adapting that regex sounds like a good idea, but you're right MarkdownExtra
should be adapted/improved too most likely. if possible.
Hi,
with both
Markdown
andMarkdownExtra
there are issues with very long content like inline images. https://github.com/michelf/php-markdown/blob/51613168d71787b0fe8472166ccbfa8d285c02cd/Michelf/MarkdownExtra.php#L1065-L1071 and also the simpler https://github.com/michelf/php-markdown/blob/51613168d71787b0fe8472166ccbfa8d285c02cd/Michelf/Markdown.php#L903-L904 lead to "Backtrack limit exhausted" errors and empty content with the defaultpcre.backtrack_limit
of1000000
. I was about to create a re-producer, but I basically see the same issue on regex101.com and it might be simpler to play around with there: https://regex101.com/r/BcWE82/1Do you think this is something that can be improved or is it suggested to increase
pcre.backtrack_limit
to deal with this? This could be the same as #149