openzim / wp1_selection_tools

Create selections with the best articles of a WM project
https://download.kiwix.org/wp1/
GNU General Public License v3.0
6 stars 3 forks source link

ZIM request change: wikipedia_en_endless #35

Closed dylanmccall closed 3 years ago

dylanmccall commented 3 years ago

https://farm.openzim.org/recipes/wikipedia_en_endless/config

customMainPage: "User:EndlessOS/ZimIndex"

In addition, please replace articleList (http://download.openzim.org/wp1/enwiki/customs/endless.tsv) with the included (zipped) file: endless.tsv.zip

kelson42 commented 3 years ago

@dylanmccall Sorry for the late feedback. There is no problem with this request but why not using the 50.000 tsv list we use to generate like before? What is so special with this list?

dylanmccall commented 3 years ago

@dylanmccall Sorry for the late feedback. There is no problem with this request but why not using the 50.000 tsv list we use to generate like before? What is so special with this list?

The only reason for that is our custom index page has some articles which aren't in that list, so this list is that existing tsv plus some articles in the custom index page. (I'm unsure if there's a mechanism to add articles linked from the index page automatically, so I added them to the list to be sure).

kelson42 commented 3 years ago

@dylanmccall This is definitly the way to go with what we have for the moment, but believe just having a whitelist (like we have a blacklist) with surnumerous/complementary articles would be a better idea. Will try to implement it that way.

kelson42 commented 3 years ago

Actually the principle of the whitelist was already implemented. We have one for example for the selection in indonesian. I have updated everything and a new ZIM should be ready in a week (unfortunately I can not do quicker as I'm in vacation).

kelson42 commented 3 years ago

This is fixed now and works fine. Unfortunately the scrape is impacted by an other bug https://github.com/openzim/mwoffliner/issues/1464