havanagrawal / wikidata-toolkit

Bot for Wikidata to fix consistency and constraint issues on television series :tv:
MIT License
5 stars 8 forks source link

Write an ethical scraper to get IMDB IDs for each episode of a show #3

Open havanagrawal opened 4 years ago

havanagrawal commented 4 years ago

IMDB IDs are a great way to connect the Wikidata entries to another dataset. A potential design is something that scrapes the IMDB page for the main show (for any show that has an IMDB title on Wikidata), looking for episodes of different season, for example:

https://www.imdb.com/title/tt1898069/episodes?season=1

We then join this on the episode list from Wikidata using title/label on a season basis, and then make an update.