sweble / sweble-wikitext

The Sweble Wikitext Components module provides a parser for MediaWiki's wikitext and an engine trying to emulate the behavior of a MediaWiki.
http://sweble.org/sites/swc-devel/develop-latest/tooling/sweble/sweble-wikitext
70 stars 27 forks source link

Mismatched boldface, InternalError on attempt to fix #44

Closed kno10 closed 7 years ago

kno10 commented 8 years ago

In John Elway:

Caused by: java.lang.InternalError
    at org.sweble.wikitext.parser.postprocessor.TreeBuilderModeBase.addRtDataOfImEndTag(TreeBuilderModeBase.java:151)
    at org.sweble.wikitext.parser.postprocessor.TreeBuilderModeBase.addRtDataOfEndTag(TreeBuilderModeBase.java:113)
    at org.sweble.wikitext.parser.postprocessor.TreeBuilderInBody.endTagR30(TreeBuilderInBody.java:1342)
    at org.sweble.wikitext.parser.postprocessor.TreeBuilderInBody.visit(TreeBuilderInBody.java:100)
    at sun.reflect.GeneratedMethodAccessor25.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at de.fau.cs.osr.utils.visitor.VisitorLogic$Target.invoke(VisitorLogic.java:306)
    at de.fau.cs.osr.utils.visitor.VisitorLogic.resolveAndVisit(VisitorLogic.java:110)
    ... 104 more

This page uses (see table "regular season") the rather crazy syntax <b style="color:red;">foo''', maybe this is causing the problem? It seems to be fixed after "patching" this article.

also in Wikipedia:Naming policy poll:

Caused by: java.lang.InternalError
    at org.sweble.wikitext.parser.postprocessor.TreeBuilderModeBase.addRtDataOfImEndTag(TreeBuilderModeBase.java:151)
    at org.sweble.wikitext.parser.postprocessor.TreeBuilderModeBase.addRtDataOfEndTag(TreeBuilderModeBase.java:113)
    at org.sweble.wikitext.parser.postprocessor.TreeBuilderInBody.endTagR30(TreeBuilderInBody.java:1342)
    at org.sweble.wikitext.parser.postprocessor.TreeBuilderInBody.visit(TreeBuilderInBody.java:100)
    at sun.reflect.GeneratedMethodAccessor25.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at de.fau.cs.osr.utils.visitor.VisitorLogic$Target.invoke(VisitorLogic.java:306)
    at de.fau.cs.osr.utils.visitor.VisitorLogic.resolveAndVisit(VisitorLogic.java:110)
    ... 104 more

here, this is probably caused by this fragment: [[User:RickK|Rick]]'''[[User talk:RickK|K''']]. This appears to be a case of "propagatable inline formatting" as discussed in your paper, but does not seem to be correctly applied (maybe because there is no italic text in front of the nested element?) or it is the unclosed bold in <span>... [[User:KuwarOnline|<b>KuwarOnline</b>]]'''</span> on that page (removing both fixed it, the first wasn't enough).

kno10 commented 8 years ago

Here is a unit test fragment:

These are some difficult bad syntax examples from Wikipedia:

== First ==

[[User:User|User]]'''[[User talk:User|User''']]

== Second ==

<span style="font-face: bold">... [[User:User|<b>User</b>]]'''</span>

== Third ==

<b style="color:red;">¹''' - yes, this is real, people write such markup.

Note that in the bottom one, Wikipedia appears to allow <b>bold'''' non-bold, but Sweble would interpret this als <b>bold'''double bold'''</b>

hannesd commented 8 years ago

A beautiful collection of horrible markup :)

I'll see what I can do about it.

Test above unit test fragment does not cause an internal error when I test. Are those two separate issues reported in one issue?

kno10 commented 8 years ago

Yes, apparently the InternalError is caused by something else. I expect my commit e8b3562b8159cee731efa07951f8f8b6899a75ca to solve the exception (but I can't tell yet if it helps - it did involve <b> XML nodes when non-XML bold was expected). It has been running for 50 minutes, without an error yet. Above markup causes some interesting errors (in particular a stray </#int-link>), so I shared it as-is, even though it apparently is not enough to trigger the bug. You can try the full Wikipedia article https://en.wikipedia.org/w/index.php?title=John_Elway&action=edit to reproduce the bug. Maybe it needs to be in a table to trigger the original bug.

hannesd commented 8 years ago

I stumbled over that strange </#int-link> as well. Curious what I did there...

hannesd commented 7 years ago

Fixed in version 2.2.0