Closed Popolechien closed 1 month ago
@Popolechien The two left one are software directly done with the wikiproject (so not customized in any manner) so far. Might do this later.
Not sure I understand your comments (actually: I don't) but ok. Thanks for the update.
@kelson42 @Popolechien is this done now?
Apparently not for Computer and Geography. @kelson42 ?
This issue is open for 4 years now, it's time for finishing the other two if applicable. I can help if wanted, just need to understand the scope and the steps needed. @kelson42 @Popolechien
A gentle reminder on this please @Popolechien @kelson42
@RavanJAltaie The way I would go about it now that Wikipedia-on-demand is out is generate a SPARQL query that includes Wikiproject Geography articles and excludes those intersecting Wikiproject Biography ones (you will have to figure out the query, or ask Wikidata folks). Ditto for the Computer part.
I poked around and I don't think there are sooo many of them to exclude, at least in the Geography part. The one I could come up with is Mercator but there must be plenty of explorers. Ditto for computers (I see Lovelace, Turing, etc.).
Both projects have 118,000 and 62,000 articles respectively, so if we can shave even 5% I would see that as a win in terms of storage. There might be other concepts we can do without (can we remove low-importance entries?), but I leave that to you.
Sorry do not have react earlier on this... but this need a bit of time and work. We have anyway problems currently and challenges around selection scripts... So, miht tke a bit before we finally tackle this usse.
@Popolechien @kelson42 I have a small question, the issue refers to that this is has been done already for Physics, Chemistry,
Mol Cell biology, and Maths.
Who did them? why can't we just repeat the same with the both remaining categories?
@kelson42 did it back in the days, but that was before WP1 and he wrote the scripts I assume.
So I have the both files ready as zim files (made in WP1), @Popolechien how shall I place them in the library?
@RavanJAltaie I think the proper thing is to generate a .tsv file with WP1, place it on drive.farm.openzim.org and put this in a mwoffliner recipe as the Article list
parameter. Isn't it how we did other selections like Wikipedia for schools?
Recipes: 1- https://farm.openzim.org/recipes/Wikipedia_en_geography 2- https://farm.openzim.org/recipes/Wikipedia_en_computer 3- https://farm.openzim.org/recipes/Wikipedia_en_finance
Files: 1- https://library.kiwix.org/viewer#wikipedia_en_geography_maxi_2024-06 2- https://library.kiwix.org/viewer#wikipedia_en_computer_maxi_2024-06 3- https://library.kiwix.org/viewer#wikipedia_en_finance_maxi_2024-06 2-
We have done this already with the Medical selection (#156): in order to save space and remain topical, we should remove articles that also intersect with Wikiprojects Biography and Companies from these selections:
PhysicsdoneChemistrydoneMol Cell biologydoneMathsdone