Performance Issue with `RegularParser`

rhukster commented 8 years ago

As outlined in the original PR comment: https://github.com/thunderer/Shortcode/pull/26#issuecomment-173325927, I was testing the Shortcode against some documentation I have for Grav CMS where I have created a Tab shortcode. I was running into a couple of issues, but one of them is related to very slow parsing of this example page with the RegularParser.

After a few back and discussions with @thunderer, he believes it is related to the various non-shortcode [] references contained in the document.

I have put together a simple test scenario (https://github.com/rhukster/shortcode-test) that shows this issue. There are two documents, a small one that is considerably slower than either the RegexParser or the WordpressParser. The RegularParser is not able to even process the full document as it just continues to spin until the PHP process is terminated.

FYI Both Wordpress and Regex parsers are able to parse this document. (note: corruption issues are handled in a separate issue https://github.com/thunderer/Shortcode/issues/25#issuecomment-173738623)

thunderer commented 8 years ago

@rhukster Just wanted to clarify: this is not what I believe to be related, I know that it's caused by many unclosed shortcodes, because such input causes RegularParser to open many recursive calls to parser rules, that's how it works. It is not your or anyone else's fault (first of all it's my fault as an author :)), just a statement of problem to solve that I'm already trying to implement.

rhukster commented 8 years ago

You might want to look at my "Ideas" suggestions that could help with performance here. Cheers!

thunderer commented 8 years ago

@rhukster I merged the performance improvements PR to master, but I found some issues with parsing correctness when dealing with complex nested cases so please use RegexParser for now, I'll give you an update once they will be resolved.

thunderer commented 8 years ago

@rhukster I think I fixed the issue in https://github.com/thunderer/Shortcode/commit/4cda52fd5b2e1d60d33ed3fc345a1b478dfe6380 , could you please test if current dev-master matches your expectations?

rhukster commented 8 years ago

I just tested your latest dev-master. The RegularParser is now much improved! As you say it renders in about 500ms on the 'test page'. No more errors and timeouts! I'll probably continue to use Regex or Wordpress parser as they are an order of magnitude faster still (Regex ~40ms, and Wordpress ~20ms).

What would the advantage be to using the RegularParser over the Regex or Wordpress one?

thunderer commented 8 years ago

Thanks, could you also help @giansi resolve problems in #29? If you say that this issue is resolved, I only need to know if his issue is something to fix before tagging new release.

As for the advantages of RegularParser - the most important one is parsing correctness impossible to achieve with regular expressions. Say that you have text like [x][x][x][/x][/x][/x] (shortcodes nested multiple levels with the same names). Regex is unable to process it correctly as it'll always see it like [x][x][x][/x] (first closing tag always closes first matching opening tag), even recursive capture groups (?R) is not going to help (regex recursion is more like inner repetition, not true recursion). I could possibly match only opening / closing tag fragments and try to analyze them (I just came up with this idea, maybe worth trying?) but then it's be basically RegexParser with RegularParser's internals.

thunderer / Shortcode

Performance Issue with `RegularParser` #27