spencermountain / dumpster-dive

roll a wikipedia dump into mongo
Other
242 stars 45 forks source link

Adding page view counts? #99

Closed praine closed 2 years ago

praine commented 2 years ago

Is there an easy way to add page view counts over a 10 year period using this tool? I need some way to rank the "importance" or "relevance" of the articles. Thanks!

spencermountain commented 2 years ago

hey Paul, the short answer is there isn't, but the longer answer is that I've actually done this somewhere, in a branch.

it downloads pageview dataset from here: https://dumps.wikimedia.org/other/pageview_complete

you can some weird scripts here, if it's helpful. They should really include this data in the dump. cheers

praine commented 2 years ago

Hi, thanks for this @spencermountain. In the end I ended up using this wikimedia endpoint to pull pageviews for articles programmatically:

https://wikimedia.org/api/rest_v1/metrics/pageviews/per-article/en.wikipedia.org/all-access/all-agents/" + encodeURIComponent(page.title) + "/monthly/20100101/20200101"