jwzimmer-zz / tv-tropes

UVM Stat 287 Final Project repo - network of tropes from TV Tropes wiki
MIT License
2 stars 3 forks source link

Exactly what got scraped & saved here when I used wget? #1

Closed jwzimmer-zz closed 3 years ago

jwzimmer-zz commented 3 years ago

Kind of unclear to me - I tried to save all html pages, but not all images and media needed to load the page, since what we care about are the texts and links. However, I do not know what actually ended up here... it looks like I got around ~9000 files locally, but fewer made it to GH due to file number limit?

jwzimmer-zz commented 3 years ago

@nguyenhphilip if you have any ideas how to check this, that would be GREAT : )

jwzimmer-zz commented 3 years ago

From discussion with Phil: use https://github.com/jwzimmer/tv-tropes/blob/main/txt_dict_from_Main%20Index%20Index%20-%20TV%20Tropes.htm.json as the master list of all tropes; check trope articles in tv-tropes folder against that list

jwzimmer-zz commented 3 years ago

@nguyenhphilip Based on your updates in the other cases, discussions, etc., I'm satisfied that you have answered this question!