openzim / wp1_selection_tools

Create selections with the best articles of a WM project
https://download.kiwix.org/wp1/
GNU General Public License v3.0
6 stars 3 forks source link

Undue weight in Football selection #16

Closed Popolechien closed 4 years ago

Popolechien commented 5 years ago

The current landing page for the football selection lists the Peruvian National Team in the Top100 articles, whereas top teams like Brazil, France or Germany are not listed.

Other unlikely top performers are the Millenium stadium (in Cardiff), CenturyLink Field (in Seattle), Celtic Park (in Glasgow), whereas Camp Nou, Santiago Bernabeu or Emirates Stadium are missing.

Similarly, Major League Soccer is in, whereas the Premier League, La Liga or the Bundesliga are not.

Popolechien commented 5 years ago

Ok, I don't know if the algorithm has changed since the ticket was open but the latest football landing page in Spanish (wikipedia_es_football_nopic_2019-05) shows mostly German clubs or otherwise entirely irrelevant articles: I would think that people are more interested in Messi and the current Champions' League than in an obscure Finnish footballer or the 2008-09 Bundesliga season- yet Aapo Mäenpää is on the landing page and Messi is not (ditto for the leagues);

  1. Can we for the time being forget all sorting attempt for sports-related selections;
  2. Can we generate selections based on traffic reports by language? (e.g. Spanish selection's landing page would be based on Spanish Wikipedia's traffic)?

Capture d’écran 2019-05-21 à 12 16 03

Popolechien commented 5 years ago

Just checked and all language version have a different landing page (es, ru, ar, zh), yet all of them manage to be largely irrelevant (except maybe the German one, which is 80% German-centered but with none of the current leaders, e.g. Bayern or BvB). We definitely should stick to most popular for sports landing pages.

Popolechien commented 5 years ago

As complement to the above, I realize there is also a strong bias towards articles starting with a number across all selections (not only football): screenshots

kelson42 commented 4 years ago

The problem was that the projects and custom selection translated versions where not sorted descending by score. This has been fixed.