Closed ChristineDomgoergen closed 5 years ago
@dev-ckln can you please check the effort in using these categories as main categories to let the user define their main interests at the beginning of the user journey?
Hi @dev-ckln, in todays check- in we discussed the topic of categories and interests again. So we have:
Can you please check as soon as possible if we can use a cronjob to curate lists of all subcategories per category by using petscan (https://petscan.wmflabs.org/). Goal: to have a daily or weekly petscan query which stores all subcategories of the manually defined categories in the tool. When the user clicks on one interest, the tool still does a live check in all those previously stored categories of all levels and shows articles in those categories.
Please also check: where we could store the list of categories per interest, it should be possible in the code or also in an external list on Wikipedia where it can be retrieved per API.
Thank you!
Hey @ChristineDomgoergen I am checking if these are the main categories https://de.wikipedia.org/wiki/Kategorie:Sachsystematik then how can we curate lists of all subcategories by using petscan. Will let you know asap.
And regarding storing the list of categories per interest , we can save that in the DB , now that we can create tables, no problem with that
Hey @ChristineDomgoergen , We can't use cron to fetch data from https://petscan.wmflabs.org/ Because they submit page on any query, Its not ajax based.
@dev-ckln @ChristineDomgoergen Never did anything with petscan, so I took 5 minutes and looked into the documentation: https://meta.wikimedia.org/wiki/PetScan/en
Then I did the following: I've created the following example query on petscan (note that every query on petscan is recorded by a PSID and you can use this PSID to re-run the same query whenever you want): https://petscan.wmflabs.org/?psid=11252548 The query looks for all sub-categories of "Wissenschaft" on the DE Wikipedia until a depth of 2 sub-categories. All that a cron would have to do, is calling that URL and storing the data. If you want the query to return machine-readable data, add the "&format=json" parameter. So: https://petscan.wmflabs.org/?format=json&psid=11252548
Isn't that all we need @dev-ckln ? Or am I missing some level of complexity here?
Thanks Tobi for your 5-minutes ;) Next time let's start directly with this investigation to speed up the whole process.
@dev-ckln Thanks for checking this out.
@darionewmonday Follow up from the Kick-Off: I checked if we can manually define a static list of categories for each of the suggested fields of interest on the first screen. We could with main categories but not with subcategories. We would use the 8 at the top here:https://de.wikipedia.org/wiki/Kategorie:Sachsystematik. The problem is that the tool - as far as I know - cannot automatically read out subcategories and if you want to do it takes too long. So we need to discuss: leave the first screen out? Or do you have other ideas for a solution?