kynikos / wiki-monkey

MediaWiki-compatible bot and editor assistant running directly in the browser and expandable with plugins.
https://github.com/kynikos/wiki-monkey/wiki
GNU General Public License v3.0
15 stars 5 forks source link

SynchronizeInterlanguageLinks: handle conflicting interlanguage links #145

Open lahwaacz opened 10 years ago

lahwaacz commented 10 years ago

I'll explain by example: mkinitcpio has {{DISPLAYTITLE:mkinitcpio}} in the header, added relatively recently. Some localized pages use [[en:Mkinitcpio]] and some use [[en:mkinitcpio]] which results in conflict:

00:07:17 Synchronizing interlanguage links...
00:07:17 Reading https://wiki.archlinux.org/index.php/Mkinitcpio...
00:07:17 Reading https://wiki.archlinux.org/index.php/Mkinitcpio (Dansk)...
00:07:17 Reading https://wiki.archlinux.de/title/Mkinitcpio...
00:07:17 Reading https://wiki.archlinux.org/index.php/Mkinitcpio (Español)...
00:07:17 Reading http://wiki.archlinux.fr/mkinitcpio...
00:07:18 Conflicting interlanguage links: [[en:mkinitcpio]] and [[en:Mkinitcpio]]

This conflict is unnecessary, the plugin should recognize DISPLAYTITLE and use that title in all interlanguage links (but check that the link works in case of some exotic usage).

kynikos commented 10 years ago

Well, [[en:mkinitcpio]] and [[en:Mkinitcpio]] should point to the same page, since for MediaWiki the capitalization of the first letter doesn't count. I was pretty sure I was comparing the titles lowercased, so I'm a bit surprised that those titles are found conflicting... DISPLAYTITLE doesn't have any effect at all on links, so we should lave it alone, unless I haven't understood what you were proposing. However I was also thinking: do we really need WM to error out when it finds a conflict? Maybe a simpler and more efficient behaviour in the editor would be to print all the titles that have been found for each language, warning the user in the log and letting him decide which one to keep; in the bot version, instead, only one of the links could be chosen, following some kind of criteria, and a warning should be logged, thus letting the bot continue processing the other articles.

lahwaacz commented 10 years ago

I meant that if there is DISPLAYTITLE on some page, its argument should be used for the interlanguage link, meaning [[en:mkinitcpio]] would be preferred over [[en:Mkinitcpio]]. Currently the plugin would use the uppercase form (if there was no conflict of course). As you pointed out, both links work, so this is only a style issue.

(In a very unlikely scenario when the argument of DISPLAYTITLE is not lowercased, the link would not work, but the page should be moved to use the appropriate title, in which case DISPLAYTITLE would be useless. This might be added to #47 if you feel it's necessary.)

About the conflicts, they are really annoying - it would certainly be better to just print a warning. For the bot interface it would be necessary for smooth usage. But perhaps the conflicts are useful in some cases, e.g. external wikis (AFAIK the German wiki uses localized titles). Is it possible that the wrong title would propagate to the page or can it be safely avoided?

kynikos commented 10 years ago

Nah it's too dangerous to use DISPLAYTITLE, and making it safe would complicate the code for no real benefit, so I'd discard the idea. If we want to uniform the use of DISPLAYTITLE among the translations of the same page, we can have a dedicated plugin for that (I follow the "do one thing and do it well" philosophy). There are two useful reminders in this bug report:

I'm not sure what you mean with the last example about the German wiki: currently if two interlanguage links are found with the same prefix (language) but different title, they are considered a conflict, so none of them can "propagate" anywhere.

lahwaacz commented 10 years ago

About the German wiki: I think that I confused two completely different things when thinking about the algorithm, the current checking should be absolutely fine. Sorry for the confusion.

kynikos commented 10 years ago

No worries, thanks for clarifying :)

lahwaacz commented 10 years ago

(I've changed the issue title to reflect the real problem)

I think that in order to avoid conflicts completely, it is also necessary to handle redirects (output from Network Configuration):

14:13:45 Synchronizing interlanguage links...
14:13:45 Reading https://wiki.archlinux.org/index.php/Network Configuration...
14:13:45 Reading https://wiki.archlinux.org/index.php/Configuring Network (Česky)...
14:13:46 Reading https://wiki.archlinux.org/index.php/Configuring Network (Ελληνικά)...
14:13:46 Reading https://wiki.archlinux.org/index.php/Configuring Network (Español)...
14:13:46 Reading http://wiki.archlinux.fr/Connexions reseau...
14:13:46 Conflicting interlanguage links: [[en:Configuring Network]] and [[en:Network Configuration]]
kynikos commented 10 years ago

Well, for the moment I've done the error -> warning fix, I think it's already a big improvement because now the user can judge if the conflict is real or not (capitalization, redirects...). Especially after implementing #132 it will be easy to do those checks. I'm moving this report to 1.15.0 and changing it to a request, because now it's a matter of making WM smarter so it can fix capitalization and redirect issues automatically.

lahwaacz commented 10 years ago

After the error -> warning change, it will be necessary to disable processing of redirect pages: [1], [2]. Note that I've got the warning which would produce a conflict before:

13:14:12 Processing Template:Nota...
13:14:12 Reading https://wiki.archlinux.org/index.php/Template%3ANota...
13:14:12 Possibly conflicting interlanguage links: [[en:Template:Note]] and [[en:Template:Nota]]
13:14:12 Reading https://wiki.archlinux.org/index.php/Template%3ANote (العربية)...
13:14:12 Possibly conflicting interlanguage links: [[en:Template:Note]] and [[en:Template:Nota]]
13:14:12 Reading https://wiki.archlinux.org/index.php/Template%3ANote (Dansk)...
13:14:12 Possibly conflicting interlanguage links: [[en:Template:Note]] and [[en:Template:Nota]]
...
13:14:13 Template:Nota processed (changed)
kynikos commented 10 years ago

Fixed, from 1.14.2 if the first page processed is a redirect, it won't be resolved; if however it's not a redirect but some of the interlanguage links collected are redirects, they will still be resolved.