stjohann / DiscordWikiBot

Discord bot for Wikimedia projects and MediaWiki wiki sites
https://w.wiki/4nm
MIT License
40 stars 9 forks source link

Improve support for MediaWiki wikis with non-standard URLs #10

Open stjohann opened 4 years ago

stjohann commented 4 years ago

Since the addition of support for different servers, it uses /wiki/$1 pattern to detect whether something is a wiki or not. Judging by interwiki map on Meta-Wiki, we can see that this is not enough to determine whether something is a MediaWiki wiki or not: they can also use simple /$1 or something convoluted like /index.php?title=$1 or /index.php/$1 and still be valid wikis.

This poses two problems for the current code:

  1. Easier one: support more wiki URL patterns in linking bot. This can be done by including checks for more URL patterns and fetching APIs of those wikis for their interwiki chains. I should come up with a good way to know (and even remember) wiki URLs somewhere, because it might be silly to ask, say, Google for /api.php a hundred times.
  2. Harder one: update the current code to use /api.php at the end of the string as a way to validate wiki URLs rather than /wiki/$1. That way, the bot will ask the API and get and remember the article path from there. I didn’t hear any requests before asking about this problem, but it will be a good thing to do. All the old values with /wiki/$1 will need to be deprecated and updated in the configs.

The removal of deprecation of old URLs will introduce a new major version (v.N.0.0) of the bot.

jhsoby commented 3 years ago

Hi, I just discovered the existence of this bot! I'm the one who made @wikilinksbot on Telegram. You may get some inspiration by how I solved this very problem in this commit (see lines 487–521).

stjohann commented 3 years ago

Hey, nice work! Glad to learn of the Telegram bot and will no doubt look into its code in the future (definitely in regards to magic words etc.).

Your approach is interesting, but I will probably try to find something less complicated. People in https://github.com/mwclient/mwclient/issues/34 suggest parsing HTML of modern wikis for <link rel="EditURI" type="application/rsd+xml" href="//www.mediawiki.org/w/api.php?action=rsd" /> for instance, that seems a bit better if you have to choose between relatively two hacky things.

For the configuring moderator, asking for the API path (maybe even linking to Special:Version on how to get it) is better in my case since the default siteinfo request (which the bot needs by default) would already contain the article path.

jhsoby commented 3 years ago

Ah, that's an even better solution than what I used, agreed! I think I will implement this too.

stjohann commented 2 years ago

https://github.com/stjohann/DiscordWikiBot/commit/b77c40a6b382597c6917822651cea40fe6750b33 adds the foundation for the changes required: URLs ending with /api.php can now be set in config.json and are treated as a valid URL. This change was made spontaneously after two separate requests in Discord, so ideally I should re-visit this and see how to proceed further, as well as test all the potentially problematic input more thoroughly. (This comment is mostly to document that this is now possible.)