Closed GoogleCodeExporter closed 8 years ago
I attached a log file, not sure if it helps - been stuck on this one for a
while.
Original comment by dfisla@gmail.com
on 14 Dec 2009 at 4:03
Did you try to increase the recursion limit:
in info.bliki.wiki.model.Configuration.java set:
PARSER_RECURSION_LIMIT = 30;
Original comment by axelclk@gmail.com
on 14 Dec 2009 at 9:22
yes, changed it to 32, I noticed that it was now parsing all of the
expressions, the
recursion level stayed well below 32, but still is tuck in an infinite loop.
Original comment by dfisla@gmail.com
on 14 Dec 2009 at 9:35
This looks like a circular parsing of one template causing a parsing of another
template which in turn causes parsing of the first template, so I don't think
this
has something to do with the expressions, instead the template parser
instantiation
mechanism.
Some kind of protection/limit may be needed to eventually abort having the
TemplateParser create instances of itself for further parsing.
Original comment by dfisla@gmail.com
on 14 Dec 2009 at 9:41
[deleted comment]
Found a solution, simple limit to template recursive calls by keeping
state/count in
the WikiModel. Patch/diff with trunk 583 attached. Please ignore the log4j
stuff.
Just an FYI, I am using your parser on 5M+ wikipedia topics, naturally there is
lot of
topics with malformed syntax and other extensions/markup which could
potentially send
parser into infinite loops. I think it makes sense to have a hard stop
limit/protections built into the parser.
Original comment by dfisla@gmail.com
on 15 Dec 2009 at 5:23
Attachments:
I think something more sophisticated like this must be implemnted:
http://en.wikipedia.org/wiki/Wikipedia:Template_limits
Original comment by axelclk@gmail.com
on 15 Dec 2009 at 5:11
Original comment by axelclk@gmail.com
on 15 Dec 2009 at 5:11
Original comment by axelclk@gmail.com
on 15 Dec 2009 at 5:11
Axel, attaching the patch file for my changes against the trunk rev. 931.
Original comment by dfisla@gmail.com
on 28 Jan 2010 at 7:02
Attachments:
attached files, changes tagged as EXPERIMENTAL
Original comment by dfisla@gmail.com
on 31 Jan 2010 at 4:49
Attachments:
For those interested, some context:
Trying to put 5M+ topics into jamwiki has been a huge learning experience to
say the
least. As a result, given the vast variation of topic markup/syntax in
wikipedia
content having parsing limits is critical to maintaining operational
performance.
As you know, in java there is no clean way to abort a thread, so when parsing a
topic, a runaway parser can destroy tomcat/glassfish as it causes the request
processing threads to get stuck in infinite loops and it only takes few bad
topic
requests to take the whole server down. I implemented caching architecture
similar
to the one used in wikipedia.org, however when building cached data the parser
performance and limits are still important.
I did manage to get control on this problem through my modifications to
WikiScanner,
TemplateParser, and AbstractParser classes of the bliki parser, where I put
limits
on the number of recursive calls, recursion depth, the size of certain buffers,
and
finally I try to measure the total parsing time and break out if possible. If
you
like I can send you an updated diff file with my changes against your current
trunk.
FYI, I am using the bliki parser, and a sandbox version of jamwiki performance
branch can be found at http://www.uniblogger.com
Original comment by dfisla@gmail.com
on 31 Jan 2010 at 4:52
Implemented the changes in revision:
http://code.google.com/p/gwtwiki/source/detail?r=935
Original comment by axelclk@gmail.com
on 31 Jan 2010 at 6:34
Original comment by axelclk@gmail.com
on 13 May 2010 at 3:29
Original issue reported on code.google.com by
dfisla@gmail.com
on 14 Dec 2009 at 3:59Attachments: