sweble / sweble-wikitext

The Sweble Wikitext Components module provides a parser for MediaWiki's wikitext and an engine trying to emulate the behavior of a MediaWiki.
http://sweble.org/sites/swc-devel/develop-latest/tooling/sweble/sweble-wikitext
70 stars 27 forks source link

"Entities should never occur in pre-processing!" #51

Closed kno10 closed 7 years ago

kno10 commented 7 years ago

When parsing https://de.wikipedia.org/w/index.php?title=Zee.One&oldid=162312843 using the latest 3.1.5-SNAPSHOT version compiled from github.

Caused by: java.lang.AssertionError: Entities should never occur in pre-processing!
    at org.sweble.wikitext.parser.preprocessor.RatsWikitextPreprocessor.getEntity(RatsWikitextPreprocessor.java:8439) ~[swc-parser-lazy-3.1.5-SNAPSHOT.jar:3.1.5-SNAPSHOT]
    at org.sweble.wikitext.parser.preprocessor.RatsWikitextPreprocessor.pParserEntity$1(RatsWikitextPreprocessor.java:7716) ~[swc-parser-lazy-3.1.5-SNAPSHOT.jar:3.1.5-SNAPSHOT]
    at org.sweble.wikitext.parser.preprocessor.RatsWikitextPreprocessor.pParserEntity(RatsWikitextPreprocessor.java:7689) ~[swc-parser-lazy-3.1.5-SNAPSHOT.jar:3.1.5-SNAPSHOT]
    at org.sweble.wikitext.parser.preprocessor.RatsWikitextPreprocessor.pValueDqChoice(RatsWikitextPreprocessor.java:6431) ~[swc-parser-lazy-3.1.5-SNAPSHOT.jar:3.1.5-SNAPSHOT]
    at org.sweble.wikitext.parser.preprocessor.RatsWikitextPreprocessor.pValueDqStar(RatsWikitextPreprocessor.java:6358) ~[swc-parser-lazy-3.1.5-SNAPSHOT.jar:3.1.5-SNAPSHOT]
    at org.sweble.wikitext.parser.preprocessor.RatsWikitextPreprocessor.pValidXmlAttribute$1(RatsWikitextPreprocessor.java:5877) ~[swc-parser-lazy-3.1.5-SNAPSHOT.jar:3.1.5-SNAPSHOT]
    at org.sweble.wikitext.parser.preprocessor.RatsWikitextPreprocessor.pValidXmlAttribute(RatsWikitextPreprocessor.java:5761) ~[swc-parser-lazy-3.1.5-SNAPSHOT.jar:3.1.5-SNAPSHOT]
    at org.sweble.wikitext.parser.preprocessor.RatsWikitextPreprocessor.pXmlHeadAttributeChoice(RatsWikitextPreprocessor.java:5441) ~[swc-parser-lazy-3.1.5-SNAPSHOT.jar:3.1.5-SNAPSHOT]
    at org.sweble.wikitext.parser.preprocessor.RatsWikitextPreprocessor.ptXmlAttributePlus(RatsWikitextPreprocessor.java:5383) ~[swc-parser-lazy-3.1.5-SNAPSHOT.jar:3.1.5-SNAPSHOT]
    at org.sweble.wikitext.parser.preprocessor.RatsWikitextPreprocessor.pTagHeader(RatsWikitextPreprocessor.java:3094) ~[swc-parser-lazy-3.1.5-SNAPSHOT.jar:3.1.5-SNAPSHOT]
    at org.sweble.wikitext.parser.preprocessor.RatsWikitextPreprocessor.pTagExtension(RatsWikitextPreprocessor.java:2997) ~[swc-parser-lazy-3.1.5-SNAPSHOT.jar:3.1.5-SNAPSHOT]
    at org.sweble.wikitext.parser.preprocessor.RatsWikitextPreprocessor.pXmlElement$$Choice1(RatsWikitextPreprocessor.java:2291) ~[swc-parser-lazy-3.1.5-SNAPSHOT.jar:3.1.5-SNAPSHOT]
    at org.sweble.wikitext.parser.preprocessor.RatsWikitextPreprocessor.pXmlElement$1(RatsWikitextPreprocessor.java:2180) ~[swc-parser-lazy-3.1.5-SNAPSHOT.jar:3.1.5-SNAPSHOT]
    at org.sweble.wikitext.parser.preprocessor.RatsWikitextPreprocessor.pXmlElement(RatsWikitextPreprocessor.java:2159) ~[swc-parser-lazy-3.1.5-SNAPSHOT.jar:3.1.5-SNAPSHOT]
    at org.sweble.wikitext.parser.preprocessor.RatsWikitextPreprocessor.pContent$ContentChoice(RatsWikitextPreprocessor.java:891) ~[swc-parser-lazy-3.1.5-SNAPSHOT.jar:3.1.5-SNAPSHOT]
    at org.sweble.wikitext.parser.preprocessor.RatsWikitextPreprocessor.pContent$ContentAtom(RatsWikitextPreprocessor.java:684) ~[swc-parser-lazy-3.1.5-SNAPSHOT.jar:3.1.5-SNAPSHOT]
    at org.sweble.wikitext.parser.preprocessor.RatsWikitextPreprocessor.pContent$ContentStar(RatsWikitextPreprocessor.java:632) ~[swc-parser-lazy-3.1.5-SNAPSHOT.jar:3.1.5-SNAPSHOT]
    at org.sweble.wikitext.parser.preprocessor.RatsWikitextPreprocessor.pContent(RatsWikitextPreprocessor.java:196) ~[swc-parser-lazy-3.1.5-SNAPSHOT.jar:3.1.5-SNAPSHOT]
    at org.sweble.wikitext.parser.preprocessor.RatsWikitextPreprocessor.pArticle(RatsWikitextPreprocessor.java:136) ~[swc-parser-lazy-3.1.5-SNAPSHOT.jar:3.1.5-SNAPSHOT]
    at org.sweble.wikitext.parser.WikitextPreprocessor.parseArticle(WikitextPreprocessor.java:81) ~[swc-parser-lazy-3.1.5-SNAPSHOT.jar:3.1.5-SNAPSHOT]
    at org.sweble.wikitext.engine.WtEngineImpl.preprocess(WtEngineImpl.java:694) ~[swc-engine-3.1.5-SNAPSHOT.jar:3.1.5-SNAPSHOT]
    at org.sweble.wikitext.engine.WtEngineImpl.postprocess(WtEngineImpl.java:415) ~[swc-engine-3.1.5-SNAPSHOT.jar:3.1.5-SNAPSHOT]
    ... 2 common frames omitted

with template expansion; but apparently the only document in dewiki that failed to parse.

sweble commented 7 years ago

I've added a new option to ParserConfig: convertIllegalCodePoints. If an illegal code point appears in a problematic location in the source it will be converted to U+FFFD. The option is set to false by default and has to be set to true explicitly.

hannesd commented 7 years ago

I've thought it over and have changed the behavior slightly. The pre-processor now can replace entities (I don't know why I chose to let it fail instead in the past but I could not come up with a good reason why it should not be possible). Also the convertIllegalCodePoints option now affects the encoding validation stage and all illegal characters will be replaced by the replacement unicode character (if turned on).