Closed xtexChooser closed 2 years ago
I am pretty sure it is able to parse templates, there are classes in the AST for templates (WtTemplate
), template arguments (WtTemplateArguments
), argument names (WtName
) and argument values (WtValue
). Positional arguments are also supported. What else do you need?
Oh, thanks, I will have a look about them!
Thanks!
Hey, @wetneb
I tryed to parse a wt doc but it seems that templates are not parsed correctly.
println(
WtAstPrinter.print(
WikitextParser(WikitextParserConfig)
.parseArticle(
"""
{{About|123456}}
{{exclusive|java}}
{{History|infdev}}
{{History|alpha}}
{{History|java}}
{{reflist}}
""".trimIndent(), "test"
)
)
)
WtParsedWikitextPage(
{P} entityMap = -
{P} warnings = C[]
[0] = WtImStartTag(
{P} name = "@p"
xmlAttributes = WtXmlAttributes[]
),
[1] = "{{About|123456}}",
[2] = WtNewline("\n"),
[3] = "{{exclusive|java}}",
[4] = WtNewline("\n"),
[5] = "{{History|infdev}}",
[6] = WtNewline("\n"),
[7] = "{{History|alpha}}",
[8] = WtNewline("\n"),
[9] = "{{History|java}}",
[10] = WtNewline("\n"),
[11] = "{{reflist}}",
[12] = WtImEndTag(
{P} name = "@p"
)
)
Templates has been parsed as texts but not WtTemplate
ops, it can be parsed with WtPreprocessor, what's the difference between WikitextParser
and WikitextPreprocessor
?
No idea! The pipeline I use in OpenRefine is as follows:
// Encoding validation
WikitextEncodingValidator v = new WikitextEncodingValidator();
String wikitext = CharStreams.toString(reader);
String title = "Page title";
ValidatedWikitext validated = v.validate(parserConfig, wikitext, title);
// Pre-processing
WikitextPreprocessor prep = new WikitextPreprocessor(parserConfig);
WtPreproWikitextPage prepArticle = (WtPreproWikitextPage) prep.parseArticle(validated, title, false);
// Parsing
PreprocessedWikitext ppw = PreprocessorToParserTransformer
.transform(prepArticle);
WikitextParser parser = new WikitextParser(parserConfig);
WtParsedWikitextPage parsedArticle;
parsedArticle = (WtParsedWikitextPage) parser.parseArticle(ppw, title);
All I can say is that this gives you parsed templates.
Thanks!
Wikitext parsing is designed as a two-staged process (at least it was at the time this library was written). First pre-processing identifies and evaluates templates. This results in an altered Wikitext that is then fed into the actual Wikitext parser.
Hello,
I want to parse some Wikitexts recently and I found this library.
I want to extract some data from InfoBox(es) but I found that this library seems to only be able to preprocess templates but not parsing templates into AST directly.
May there be a solution?
Thanks.