quetzai / wikimodel

Automatically exported from code.google.com/p/wikimodel
0 stars 0 forks source link

MediaWikiParser - Complex macro fails to be parsed as macro. Due to nested macro as paramter ? #205

Open GoogleCodeExporter opened 8 years ago

GoogleCodeExporter commented 8 years ago
The following macro from 
http://en.wikipedia.org/wiki/Louis,_Duke_of_Brittany_(1707–1712) fails to be 
parsed as a macro. Could it be due to nested macro parameters ?

{{Infobox royalty
| name        = Louis
| title       = Dauphin of France; Duke of Brittany
| image       = Louis de bourbon (1707-1712).jpg 
| imgw        = 200px
| caption     = Louis, "duc de Bretagne"
| full name   = Louis de France
| house       = [[House of Bourbon]]
| father      = [[Louis, Duke of Burgundy (1682–1712)|Louis de France, Duke 
of Burgundy]]
| mother      = [[Princess Marie-Adélaïde of Savoy|Princess Marie Adélaïde 
of Savoy]]
| birth_date  = {{Birth date|1707|1|8|df=y}}
| birth_place = Palace of Versailles, France
| death_date  = {{Death date and age|1712|3|8|1707|1|8|df=y}}
| death_place = Palace of Versailles, France
| burial_date =
| burial_place= [[Basilica of St Denis]]
| religion    = [[Catholic Church|Roman Catholicism]]
}}

Original issue reported on code.google.com by jerome.v...@gmail.com on 12 Jul 2011 at 2:47

GoogleCodeExporter commented 8 years ago

Original comment by thomas.m...@gmail.com on 13 Jul 2011 at 7:18

GoogleCodeExporter commented 8 years ago
I have a patch for this which I've attached.

This doesn't attempt to parse the nested templates, just changes the parser to 
allow them within a template block. This turned out to be a bit fiddly because 
it needs to ensure the nested braces are balanced, so can no longer be done by 
the lexer alone.

NB:

* This is my first time messing with JavaCC so definitely needs review (e.g. do 
I need LOOKAHEAD directives somewhere? is there a quicker way to achieve this?)
* You'll need to ant -f RebuildScanners.xml after applying the patch (I didn't 
include diffs to the auto-generated code)

If anyone has time to review this I'd appreciate it, may be able to contribute 
some more fixes to other issues if I'm comfortable that I'm able to make 
changes to the grammar in a way that people are happy with.

Original comment by matt...@swiftkey.com on 26 Jun 2013 at 5:39

Attachments:

GoogleCodeExporter commented 8 years ago
As a bit of motivation for this being a priority by the way: a significant 
proportion of high-profile wikipedia pages *do* have nested templates, usually 
within an infobox template. Without a fix for this issue, parsing these pages 
results in a lot of infobox template markup being displayed as broken text, 
which hurts all sorts of applications.

Original comment by matt...@swiftkey.com on 26 Jun 2013 at 5:47