Closed kno10 closed 7 years ago
I agree and patches are of course welcome, however, I will not have the time to work on this myself.
Hey I wrote a patch 3 years ago that automatically generates a config by calling the wikipedia api. You can take a look at it here: https://github.com/sweble/sweble-wikitext/blob/develop/sweble-wikitext-components-parent/swc-engine/src/main/java/org/sweble/wikitext/engine/utils/LanguageConfigGenerator.java
I'm not sure if it still works and I think the code can be greatly improved if you have time. It worked for me when i used sweble to parse the german wikipedia.
Thank you. This looks very useful. It causes some (legit) exceptions, so I will have to look into this:
The name
sub
was already registered by the aliasimg_sub
when trying to register it for aliassub
.
@pumpadump it works mostly. However, the API returns if
as alias for the if
magic word, but Sweble needs this to be #if:
to work. Thus, the parser functions no longer work.
Also missing: from ".wikipedia.org/w/api.php?action=query&meta=siteinfo&siprop=general&format=xml"
, get the linktrail
setting, convert it to a Java regexp (keeping only the first group), and assign this as
wikiConfig.getParserConfig().setInternalLinkPostfixPattern(pattern);
Maybe also other of these general settings should be used...
For the English Wikipedia, we have
DefaultConfigEnWp
, but for other Wikipedia this still needs to be manually configured. It would be great to have tested configurations for other major Wikipedias, e.g. German Wikipedia, French, and Spanish. I tried finding the settings in use by Wikipedia, but did not get very far. Some configuration appears to be here: https://noc.wikimedia.org/conf/ (in particular, InitializeSettings.php), while other parts are part of MediaWiki, rather than configuration: https://phabricator.wikimedia.org/diffusion/MW/browse/master/languages/messages/MessagesDe.php