openzim / zim-requests

Want a new ZIM file? Propose ZIM content improvements or fixes? Here you are!
https://farm.openzim.org
38 stars 2 forks source link

Scientific selections should exclude biographies #183

Closed Popolechien closed 1 month ago

Popolechien commented 5 years ago

We have done this already with the Medical selection (#156): in order to save space and remain topical, we should remove articles that also intersect with Wikiprojects Biography and Companies from these selections:

kelson42 commented 4 years ago

@Popolechien The two left one are software directly done with the wikiproject (so not customized in any manner) so far. Might do this later.

Popolechien commented 4 years ago

Not sure I understand your comments (actually: I don't) but ok. Thanks for the update.

RavanJAltaie commented 8 months ago

@kelson42 @Popolechien is this done now?

Popolechien commented 8 months ago

Apparently not for Computer and Geography. @kelson42 ?

RavanJAltaie commented 8 months ago

This issue is open for 4 years now, it's time for finishing the other two if applicable. I can help if wanted, just need to understand the scope and the steps needed. @kelson42 @Popolechien

RavanJAltaie commented 5 months ago

A gentle reminder on this please @Popolechien @kelson42

Popolechien commented 5 months ago

@RavanJAltaie The way I would go about it now that Wikipedia-on-demand is out is generate a SPARQL query that includes Wikiproject Geography articles and excludes those intersecting Wikiproject Biography ones (you will have to figure out the query, or ask Wikidata folks). Ditto for the Computer part.

I poked around and I don't think there are sooo many of them to exclude, at least in the Geography part. The one I could come up with is Mercator but there must be plenty of explorers. Ditto for computers (I see Lovelace, Turing, etc.).

Both projects have 118,000 and 62,000 articles respectively, so if we can shave even 5% I would see that as a win in terms of storage. There might be other concepts we can do without (can we remove low-importance entries?), but I leave that to you.

kelson42 commented 5 months ago

Sorry do not have react earlier on this... but this need a bit of time and work. We have anyway problems currently and challenges around selection scripts... So, miht tke a bit before we finally tackle this usse.

RavanJAltaie commented 5 months ago

@Popolechien @kelson42 I have a small question, the issue refers to that this is has been done already for Physics, Chemistry,
Mol Cell biology, and Maths. Who did them? why can't we just repeat the same with the both remaining categories?

Popolechien commented 5 months ago

@kelson42 did it back in the days, but that was before WP1 and he wrote the scripts I assume.

RavanJAltaie commented 4 months ago

So I have the both files ready as zim files (made in WP1), @Popolechien how shall I place them in the library?

Popolechien commented 4 months ago

@RavanJAltaie I think the proper thing is to generate a .tsv file with WP1, place it on drive.farm.openzim.org and put this in a mwoffliner recipe as the Article list parameter. Isn't it how we did other selections like Wikipedia for schools?

RavanJAltaie commented 1 month ago

Recipes: 1- https://farm.openzim.org/recipes/Wikipedia_en_geography 2- https://farm.openzim.org/recipes/Wikipedia_en_computer 3- https://farm.openzim.org/recipes/Wikipedia_en_finance

Files: 1- https://library.kiwix.org/viewer#wikipedia_en_geography_maxi_2024-06 2- https://library.kiwix.org/viewer#wikipedia_en_computer_maxi_2024-06 3- https://library.kiwix.org/viewer#wikipedia_en_finance_maxi_2024-06 2-