sweble / sweble-wikitext

The Sweble Wikitext Components module provides a parser for MediaWiki's wikitext and an engine trying to emulate the behavior of a MediaWiki.
http://sweble.org/sites/swc-devel/develop-latest/tooling/sweble/sweble-wikitext
70 stars 27 forks source link

DefaultConfig for popular wikis/languages besides English #50

Closed kno10 closed 7 years ago

kno10 commented 7 years ago

For the English Wikipedia, we have DefaultConfigEnWp, but for other Wikipedia this still needs to be manually configured. It would be great to have tested configurations for other major Wikipedias, e.g. German Wikipedia, French, and Spanish. I tried finding the settings in use by Wikipedia, but did not get very far. Some configuration appears to be here: https://noc.wikimedia.org/conf/ (in particular, InitializeSettings.php), while other parts are part of MediaWiki, rather than configuration: https://phabricator.wikimedia.org/diffusion/MW/browse/master/languages/messages/MessagesDe.php

hannesd commented 7 years ago

I agree and patches are of course welcome, however, I will not have the time to work on this myself.

pumpadump commented 7 years ago

Hey I wrote a patch 3 years ago that automatically generates a config by calling the wikipedia api. You can take a look at it here: https://github.com/sweble/sweble-wikitext/blob/develop/sweble-wikitext-components-parent/swc-engine/src/main/java/org/sweble/wikitext/engine/utils/LanguageConfigGenerator.java

I'm not sure if it still works and I think the code can be greatly improved if you have time. It worked for me when i used sweble to parse the german wikipedia.

kno10 commented 7 years ago

Thank you. This looks very useful. It causes some (legit) exceptions, so I will have to look into this:

The name sub was already registered by the alias img_sub when trying to register it for alias sub.

kno10 commented 7 years ago

@pumpadump it works mostly. However, the API returns if as alias for the if magic word, but Sweble needs this to be #if: to work. Thus, the parser functions no longer work.

kno10 commented 7 years ago

Also missing: from ".wikipedia.org/w/api.php?action=query&meta=siteinfo&siprop=general&format=xml", get the linktrail setting, convert it to a Java regexp (keeping only the first group), and assign this as

wikiConfig.getParserConfig().setInternalLinkPostfixPattern(pattern);

Maybe also other of these general settings should be used...