rehamaltamimi / gwtwiki

Automatically exported from code.google.com/p/gwtwiki
0 stars 0 forks source link

TemplateParser stuck in infinite loop #32

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
What steps will reproduce the problem?
1. Parsing the following Wikipedia Article: 
http://en.wikipedia.org/wiki/1001

What is the expected output? What do you see instead?
looks like the parser is tuck in recursive parsing of templates

What version of the product are you using? On what operating system?
trunk svn 583, jdk 1.6.0_16-b01, ubuntu 9.04

Please provide any additional information below.

Original issue reported on code.google.com by dfisla@gmail.com on 14 Dec 2009 at 3:59

Attachments:

GoogleCodeExporter commented 8 years ago
I attached a log file, not sure if it helps - been stuck on this one for a 
while.

Original comment by dfisla@gmail.com on 14 Dec 2009 at 4:03

GoogleCodeExporter commented 8 years ago
Did you try to increase the recursion limit:

in info.bliki.wiki.model.Configuration.java set:
  PARSER_RECURSION_LIMIT = 30;

Original comment by axelclk@gmail.com on 14 Dec 2009 at 9:22

GoogleCodeExporter commented 8 years ago
yes, changed it to 32, I noticed that it was now parsing all of the 
expressions, the 
recursion level stayed well below 32, but still is tuck in an infinite loop.

Original comment by dfisla@gmail.com on 14 Dec 2009 at 9:35

GoogleCodeExporter commented 8 years ago
This looks like a circular parsing of one template causing a parsing of another 
template which in turn causes parsing of the first template, so I don't think 
this 
has something to do with the expressions, instead the template parser 
instantiation 
mechanism.

Some kind of protection/limit may be needed to eventually abort having the 
TemplateParser create instances of itself for further parsing.

Original comment by dfisla@gmail.com on 14 Dec 2009 at 9:41

GoogleCodeExporter commented 8 years ago
[deleted comment]
GoogleCodeExporter commented 8 years ago
Found a solution, simple limit to template recursive calls by keeping 
state/count in
the WikiModel. Patch/diff with trunk 583 attached. Please ignore the log4j 
stuff.

Just an FYI, I am using your parser on 5M+ wikipedia topics, naturally there is 
lot of
topics with malformed syntax and other extensions/markup which could 
potentially send
parser into infinite loops. I think it makes sense to have a hard stop
limit/protections built into the parser.

Original comment by dfisla@gmail.com on 15 Dec 2009 at 5:23

Attachments:

GoogleCodeExporter commented 8 years ago
I think something more sophisticated like this must be implemnted:
http://en.wikipedia.org/wiki/Wikipedia:Template_limits

Original comment by axelclk@gmail.com on 15 Dec 2009 at 5:11

GoogleCodeExporter commented 8 years ago

Original comment by axelclk@gmail.com on 15 Dec 2009 at 5:11

GoogleCodeExporter commented 8 years ago

Original comment by axelclk@gmail.com on 15 Dec 2009 at 5:11

GoogleCodeExporter commented 8 years ago
Axel, attaching the patch file for my changes against the trunk rev. 931.

Original comment by dfisla@gmail.com on 28 Jan 2010 at 7:02

Attachments:

GoogleCodeExporter commented 8 years ago
attached files, changes tagged as EXPERIMENTAL

Original comment by dfisla@gmail.com on 31 Jan 2010 at 4:49

Attachments:

GoogleCodeExporter commented 8 years ago
For those interested, some context:

Trying to put 5M+ topics into jamwiki has been a huge learning experience to 
say the 
least. As a result, given the vast variation of topic markup/syntax in 
wikipedia 
content having parsing limits is critical to maintaining operational 
performance.

As you know, in java there is no clean way to abort a thread, so when parsing a 
topic, a runaway parser can destroy tomcat/glassfish as it causes the request 
processing threads to get stuck in infinite loops and it only takes few bad 
topic 
requests to take the whole server down. I implemented caching architecture 
similar 
to the one used in wikipedia.org, however when building cached data the parser 
performance and limits are still important.

I did manage to get control on this problem through my modifications to 
WikiScanner, 
TemplateParser, and AbstractParser classes of the bliki parser, where I put 
limits 
on the number of recursive calls, recursion depth, the size of certain buffers, 
and 
finally I try to measure the total parsing time and break out if possible.  If 
you 
like I can send you an updated diff file with my changes against your current 
trunk.

FYI, I am using the bliki parser, and a sandbox version of jamwiki performance 
branch can be found at http://www.uniblogger.com 

Original comment by dfisla@gmail.com on 31 Jan 2010 at 4:52

GoogleCodeExporter commented 8 years ago
Implemented the changes in revision:
http://code.google.com/p/gwtwiki/source/detail?r=935

Original comment by axelclk@gmail.com on 31 Jan 2010 at 6:34

GoogleCodeExporter commented 8 years ago

Original comment by axelclk@gmail.com on 13 May 2010 at 3:29