Parsing of template redirects

GoogleCodeExporter commented 8 years ago

Consider:

Template:Test1
------- contents -------
#REDIRECT[[Template:Test2]]

Template:Test
-------- contents ------
<p>Hello World!</p>

given any article that references {{Test1}} should output:
<p>Hello World!</p>

for specific wikipedia example you can refer to:
Template:InfosectionEnd, and
Template:InfoboxEnd

I got this to work by making the following changes to TemplateParser class:

private static final Pattern REDIRECT_PATTERN = 
Pattern.compile(".*#REDIRECT[ ]*\\[\\[Template:(.*)\\]\\].*", 
Pattern.CASE_INSENSITIVE | Pattern.DOTALL); 

in method:

protected static void parseRecursive(String rawWikitext, IWikiModel 
wikiModel, Appendable writer, boolean parseOnlySignature, boolean 
renderTemplate, HashMap<String, String> templateParameterMap) throws 
IOException {

...

// Experimental to handle #REDIRECT [[Template:Foo]]
Matcher m = REDIRECT_PATTERN.matcher(rawWikitext);
if (m.matches()) {
   String redirectTemplate = m.group(1);
   rawWikitext = m.replaceAll("{{" + redirectTemplate + "}}");
}

There has to be a better way to implement this, I don't really like mixing 
in the regex matcher with the rest of the template parser based on the fact 
the template parser operates on the character level.

Also, currently this works because the parser will replace the redirect 
reference with the actual template markup and recurse on it - therefore the 
redirect is followed and content is evaluated/parsed. The template parser 
will also not stop and continue parsing the rest of the main template which 
is opposite to how page redirects are handled where the page parser stops 
processing the rest of the content after running into a redirect construct.

I also allow for whitespace between #REDIRECT keyword and the first outer 
"[[" brackets as there are many Wikipedia articles that do this instead of 
the proper #REDIRECT[[...]] syntax.

In any case, the above change works for now - this was a big deal for me 
when parsing Wikipedia pages & templates.

Original issue reported on code.google.com by dfisla@gmail.com on 28 Apr 2010 at 3:06

GoogleCodeExporter commented 8 years ago

The parsing of redirects, at least the way implemented above, requires that it 
is the 
first recursive call - see attached file for details.

Original comment by dfisla@gmail.com on 28 Apr 2010 at 3:09

Attachments:

TemplateParser.java

GoogleCodeExporter commented 8 years ago

Original comment by axelclk@gmail.com on 28 Apr 2010 at 7:04

Changed state: Accepted

GoogleCodeExporter commented 8 years ago

Could you please try, if these changes solve the problem:
http://code.google.com/p/gwtwiki/source/detail?r=1107

Original comment by axelclk@gmail.com on 28 Apr 2010 at 7:05

GoogleCodeExporter commented 8 years ago

Works for me! Thanks.

Original comment by dfisla@gmail.com on 29 Apr 2010 at 2:11

GoogleCodeExporter commented 8 years ago

Original comment by axelclk@gmail.com on 13 May 2010 at 3:28

Changed state: Fixed

GoogleCodeExporter commented 8 years ago

Hi axelclk,
Im new on Bliki, and I have the same problem sfisla is talking about - with the 
redirections... I saw that you amde some changes to the bliki.core library.
can you please explain me how to update me files with your file in order to 
handle the redirection issue.
Im using maven (I didnt succed to activate my project without making it a maven 
project.)

Original comment by ido1...@gmail.com on 5 Jun 2012 at 6:46

GoogleCodeExporter commented 8 years ago

This has been fixed some time ago - r1107 noted above. What version of bliki 
api are you using?

Original comment by dfi...@itmatter.com on 5 Jun 2012 at 1:17

GoogleCodeExporter commented 8 years ago

Yes, I checked out the bliki code and saw that these changes have already been 
added to the library.
I followed this tutorial exactly:
http://www.integratingstuff.com/2012/04/06/hook-into-wikipedia-using-java-and-th
e-mediawiki-api/
and still, when I search for "Machester United football club" I don't get 
anything, but when I use the dibugger I notice I have a "#REDIRECT[[Machester 
United F.C.]]", means that it doesn't do the recursive call for redirections.
Can you explain what I have to do in order to pass over all those redirection 
and just get the value of what I search for? (please refer to the tutorial in 
your answer as I am not an expert in bliki...)

Original comment by ido1...@gmail.com on 5 Jun 2012 at 4:44

GoogleCodeExporter commented 8 years ago

You have a typo in your article text:
"Machester United football club"

versus

"Manchester United football club"

Original comment by axelclk@gmail.com on 5 Jun 2012 at 5:18

GoogleCodeExporter commented 8 years ago

Its just a typo in the comment...
When I search for "Machester United Football Club" and use the debugger,
I notice that after doing the next line:
List<Page> listOfPages = user.queryContent(listOfTitleStrings);
I get the next value for listOfPages:
[PageID: 2983919; NS: 0; Title: Manchester United Football Club; 
Image url: 
Content:
#REDIRECT [[Manchester United F.C.]]]

Original comment by ido1...@gmail.com on 5 Jun 2012 at 5:46

GoogleCodeExporter commented 8 years ago

OK.
There's a IWikModel#getRedirectLink() method which returns the parsed redircet 
link after parsing the first wiki text.
With this redirect title string you can again use the Wikipedia API.

See the modified HTMLCreatorExample I've commited in r5527.

The Example creates two files.

The first file  
  C:\temp\Manchester_United_Football_Club.html
contains no visible HTML.

The second file
   C:\temp\Manchester_United_F.C..html
contains the rendered redirect title.

Original comment by axelclk@gmail.com on 5 Jun 2012 at 6:58

GoogleCodeExporter commented 8 years ago

Thank You very much!
Now I can handle redirections so thank's:)

Original comment by ido1...@gmail.com on 6 Jun 2012 at 12:22

rehamaltamimi / gwtwiki

Parsing of template redirects #38