openzim / wp1_selection_tools

Create selections with the best articles of a WM project
https://download.kiwix.org/wp1/
GNU General Public License v3.0
6 stars 3 forks source link

Indonesian TOP selection is irrelevant #22

Closed Popolechien closed 4 years ago

Popolechien commented 5 years ago

Currently the first 30 articles from wikipedia_id_top_maxi_2019-08.zim are lists of rural French cities by department. I doubt this ever would feature in the at the top of the top 50,000 most interesting articles for Bahasa speakers. Until we have something without so many false positive it simply would make more sense to use the top most read articles, without trying to curate these. Capture d’écran 2019-08-27 à 15 24 26

kelson42 commented 5 years ago

Strongly suspect this is because of #6

kelson42 commented 4 years ago

@Popolechien I had a look accurately to that silly result and came to the conclusion that everything works as intended and I have no real good idea how to do it better. What the WPID contributors have made is that they have created redirects for each French village to this kind of articles (villages in departement XYZ). As a consequence the KPI of all the redirects are added to the targeted articles (number of pageviews, lang links, page links)... and all these indicators are as a consequence really high. Even if I would take only the pageviews, they would probably still stay quite on the top.

kelson42 commented 4 years ago

The pageviews work better than I thought first. I have tweaked the algorithm to make them more proeminent in the final score and the result is a lot better.