peter17 / mediawiki-parser

An experimental Python parser for MediaWiki syntax with a focus on extensibility and comprehensibility
GNU General Public License v3.0
59 stars 18 forks source link

Parses wikitable cells as inline paragraphs if text contains a front slash (/) #4

Closed benrichards86 closed 7 years ago

benrichards86 commented 7 years ago

If a wikitable data cell contains a front slash (/) in its contents, it appears that the parser will take all the text starting at the front slash and continuing until the end of the line and treats that entire string as an inline paragraph. If there are multiple data cells specified in-line, it will treat the entire line as the text to output, even the data cell separator token (||). It appears to be an issue in mediawikiParser.py itself. Looking at the output from the text and html postprocessors reflects that it's interpreting the text in this way.

This parsing bug seems to only present itself when it is parsing tables. I haven't seen it occur with normal paragraphs or lists.

Here's a simplified testcase that exposes the issue:

from mediawiki_parser.preprocessor import make_parser
preprocessor = make_parser({})

from mediawiki_parser.html import make_parser
parser = make_parser([], [], [], {}, {})

source = """
{|
|AB||CD
|-
|E/F||GH
|}
"""

preprocessed_text = preprocessor.parse(source)
output = parser.parse(preprocessed_text.leaves())

print output.value

Below is the HTML code that results:

<body>
<table>
<tr>
    <td>AB</td>
    <td>CD</td>
</tr>
<tr>
    <td>E<p>/F||GH</p>
</td>
</tr>
</table>
</body>

The preferred output is that the second row should be rendered as such:

<tr>
    <td>E/F</td>
    <td>GH</td>
</tr>

The test fails with the latest version of mediawiki-parser (d7fa6ffeb7daf1e5368feefcdd6fc4d37c18ab98) and pijnu (peter17/pijnu@25518c3d1ca10955af514b36132777f15027d42d).

peter17 commented 7 years ago

Hi @benrichards86 Thanks for reporting. This should be fixed now! Regards.

benrichards86 commented 7 years ago

Thank you!

On Jan 31, 2017 5:37 PM, "Peter Potrowl" notifications@github.com wrote:

Hi @benrichards86 https://github.com/benrichards86 Thanks for reporting. This should be fixed now! Regards.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/peter17/mediawiki-parser/issues/4#issuecomment-276515972, or mute the thread https://github.com/notifications/unsubscribe-auth/AB_xlsQ4_r4G0NPG6ZmMwsu8mqlLvSQJks5rX7ecgaJpZM4LojOG .