wmde / mitmachen

Help new Wikipedia editors find articles with known issues
https://tools.wmflabs.org/mitmachen
GNU General Public License v2.0
2 stars 1 forks source link

Mapping categories on user interests (first screen) #24

Closed ChristineDomgoergen closed 5 years ago

ChristineDomgoergen commented 5 years ago

@darionewmonday Follow up from the Kick-Off: I checked if we can manually define a static list of categories for each of the suggested fields of interest on the first screen. We could with main categories but not with subcategories. We would use the 8 at the top here:https://de.wikipedia.org/wiki/Kategorie:Sachsystematik. The problem is that the tool - as far as I know - cannot automatically read out subcategories and if you want to do it takes too long. So we need to discuss: leave the first screen out? Or do you have other ideas for a solution?

darionewmonday commented 5 years ago

@dev-ckln can you please check the effort in using these categories as main categories to let the user define their main interests at the beginning of the user journey?

ChristineDomgoergen commented 5 years ago

Hi @dev-ckln, in todays check- in we discussed the topic of categories and interests again. So we have:

Can you please check as soon as possible if we can use a cronjob to curate lists of all subcategories per category by using petscan (https://petscan.wmflabs.org/). Goal: to have a daily or weekly petscan query which stores all subcategories of the manually defined categories in the tool. When the user clicks on one interest, the tool still does a live check in all those previously stored categories of all levels and shows articles in those categories.

Please also check: where we could store the list of categories per interest, it should be possible in the code or also in an external list on Wikipedia where it can be retrieved per API.

Thank you!

dev-ckln commented 5 years ago

Hey @ChristineDomgoergen I am checking if these are the main categories https://de.wikipedia.org/wiki/Kategorie:Sachsystematik then how can we curate lists of all subcategories by using petscan. Will let you know asap.

And regarding storing the list of categories per interest , we can save that in the DB , now that we can create tables, no problem with that

dev-ckln commented 5 years ago

Hey @ChristineDomgoergen , We can't use cron to fetch data from https://petscan.wmflabs.org/ Because they submit page on any query, Its not ajax based.

tobijat commented 5 years ago

@dev-ckln @ChristineDomgoergen Never did anything with petscan, so I took 5 minutes and looked into the documentation: https://meta.wikimedia.org/wiki/PetScan/en

Then I did the following: I've created the following example query on petscan (note that every query on petscan is recorded by a PSID and you can use this PSID to re-run the same query whenever you want): https://petscan.wmflabs.org/?psid=11252548 The query looks for all sub-categories of "Wissenschaft" on the DE Wikipedia until a depth of 2 sub-categories. All that a cron would have to do, is calling that URL and storing the data. If you want the query to return machine-readable data, add the "&format=json" parameter. So: https://petscan.wmflabs.org/?format=json&psid=11252548

Isn't that all we need @dev-ckln ? Or am I missing some level of complexity here?

darionewmonday commented 5 years ago

Thanks Tobi for your 5-minutes ;) Next time let's start directly with this investigation to speed up the whole process.

@dev-ckln Thanks for checking this out.