jwzimmer-zz / tv-tropes

UVM Stat 287 Final Project repo - network of tropes from TV Tropes wiki
MIT License
3 stars 3 forks source link

Index page dicts - links to masterlist tropes only #18

Closed jwzimmer-zz closed 3 years ago

jwzimmer-zz commented 3 years ago

From: https://github.com/jwzimmer/tv-tropes/issues/17

Make new dicts from the index pages (need a masterlist of indices too) that only include tropes that are in our masterlist.

jwzimmer-zz commented 3 years ago

Masterlist of indices - things in main (https://tvtropes.org/pmwiki/index_report.php) that aren't listed as tropes in our tropes masterlist?

nguyenhphilip commented 3 years ago

Done! Script here: https://github.com/jwzimmer/tv-tropes/blob/main/pull-index.ipynb

Index master dictionary as well as individual files: https://github.com/jwzimmer/tv-tropes/tree/main/index-list

Have ~4218 indices. Will probably need some way to filter these as we likely don't want this many in our visualization.

jwzimmer-zz commented 3 years ago

Oh whoops I did not see your comment here, my bad! This is great, thanks!

So this is: for every entry in Main Indices (https://tvtropes.org/pmwiki/index_report.php), if it links to tropes in our masterlist of tropes, then we now have a dictionary of those tropes? (In other words, no indices that have no links to tropes in our masterlist, and no tropes that are not in our masterlist?)

What about the case when an index linked to another index (if that happened ever)?

jwzimmer-zz commented 3 years ago

Resolved by @nguyenhphilip