sweble / sweble-wikitext

The Sweble Wikitext Components module provides a parser for MediaWiki's wikitext and an engine trying to emulate the behavior of a MediaWiki.
http://sweble.org/sites/swc-devel/develop-latest/tooling/sweble/sweble-wikitext
70 stars 27 forks source link

Table cells not parsed if used with translate extension #41

Closed intracer closed 8 years ago

intracer commented 8 years ago

Example of the page https://meta.wikimedia.org/wiki/Grants:APG/Proposals/2015-2016_round1/Wikimedia_Ukraine/Proposal_form/Detailed_budget

It contains table cells wrapped in

<translate><!--T:x-->
cell value</translate>

Sweble cannot parse it unless you remove newline after So I think It does not understand that <translate> \n <translate>is one entity inside a cell

hannesd commented 8 years ago

Thanks for reporting this! I'll be out of office for the next 2-3 weeks and won't be able to work on this. In the meantime I have a few questions:

intracer commented 8 years ago

It treats all cells as one cell

https://github.com/intracer/sweble-bug41

Output is

*** bug, one cell *** 

** start wiki text **
{|
|-
|<translate><!--T:1-->
cell1</translate> ||cell2
|}
** end wiki text **
Cells: 1
WtText("\ncell1")

 *** expected, two cells *** 

** start wiki text **
{|
|-
|<translate><!--T:1-->cell1</translate> ||cell2
|}
** end wiki text **
Cells: 2
WtText("cell1")
WtText("cell2")

Parser is called this way:

WikiConfig config = DefaultConfigEnWp.generate();
new WtEngineImpl(config).postprocess(new PageId(PageTitle.make(config, title), -1), text, null);
intracer commented 8 years ago

I think there should be a TagExtensionGroup for translate tag

hannesd commented 8 years ago

Thanks for the great issue description! With the full wikitext example I think I now understand the problem. Judging from your last comment you have already found the true reason of the problem:

The example uses the inline cell separator ||. For that operator to work it has to be on the same line as the opening | operator. That's not the case in your example. The reason why it is expected to work is because the <translate> tag extension hides the newline after the comment from the parser.

I don't know the <translate> tag extension and whether it's a core feature of MediaWiki or provided by a plugin. Whatever the case may be, Sweble unfortunately only supports very few parser functions and tag extensions since implementing the full set is simply too much effort and no one has the time.

The <translate> tag extension is not know to Sweble and the wikitext is therefore treated as a random opening tag and a closing tag. The content in between is not hidden from the parser. As you've already pointed out yourself, telling Sweble about the translate tag extension would at least partly solve the problem. The content would then be hidden from the parser and the cell recognition should work.

To fake the translate tag extension you can have a look at the inner class org.sweble.wikitext.engine.ext.builtin.BuiltInTagExtensions.TagExtensionPre. You can basically copy the complete behavior of this class and only replace "pre" with "translate". Put the new tag extension class in your own tag extension group class and register the class with the config as can be seen in org.sweble.wikitext.engine.utils.DefaultConfig.addTagExtensions(...).

Of course, this would treat the contents of the <translate> element as pure text and not recognize any markup. If you want the parser to parse the contents of the tag extension things get tricky. Let me know, if that's the behavior you would want and I can give you some pointers how to get there.

intracer commented 8 years ago

For the table that I gave in the first message copying Pre will work, but in general text inside <translate> can be wiki markup that should be parsed.

hannesd commented 8 years ago

Sorry for taking so long. I've send you a pull request for https://github.com/intracer/sweble-bug41. It shows how to write a simple tag extension that would parse and expand the translate tag extension contents.