jwzimmer-zz / tv-tropes

UVM Stat 287 Final Project repo - network of tropes from TV Tropes wiki
MIT License
2 stars 3 forks source link

What went wrong with 23919, Eagleland.html? #11

Closed jwzimmer-zz closed 3 years ago

jwzimmer-zz commented 3 years ago

https://github.com/jwzimmer/tv-tropes/commit/d04f03ebba11dc183e19d6b63db398cccc772eef

I think it was 23919 (or thereabouts): ` it.alltropes[23919] Out[76]: 'Eagleland.html'

it.alltropes[23918] Out[77]: 'WriterRevolt.html'

it.alltropes[23920] Out[78]: 'ReallyRoyaltyReveal.html' YEP! That's the one: it.get_lists_tropes("trope_list/tropes/Eagleland.html") Traceback (most recent call last):

File "", line 1, in it.get_lists_tropes("trope_list/tropes/Eagleland.html")

File "/Users/jzimmer1/Documents/GitHub/tv-tropes/individualtropepage.py", line 40, in get_lists_tropes href = link["href"]

File "/Users/jzimmer1/opt/anaconda3/lib/python3.8/site-packages/bs4/element.py", line 1401, in getitem return self.attrs[key]

KeyError: 'href' `

nguyenhphilip commented 3 years ago

KeyError = when you tried to get the "href" attribute from the <a> tag, it didn't have one. mentioned way of resolving it in other GH issue by using if trope.has_attr('href') when looping through

jwzimmer-zz commented 3 years ago

Thanks, @nguyenhphilip, I will do this tmrw! 👍

jwzimmer-zz commented 3 years ago

So, there are some differences in the page structure I think we might want to consider... I don't know how many of the articles fit with either pattern...

Comparing Critic Breakdown and Eagleland:

On the Critic Breakdown trope page, https://tvtropes.org/pmwiki/pmwiki.php/Main/CriticBreakdown, the links of interest I think are just those within the main text of the article:

Screen Shot 2020-10-27 at 10 00 03 AM

Which at first blush appears to match those returned by individualtropepage.py:

Screen Shot 2020-10-27 at 10 03 58 AM

However, on Eagleland, https://tvtropes.org/pmwiki/pmwiki.php/Main/Eagleland, there is a section at the bottom of the page titled "Related tropes include:", followed by a list. It's not obvious to me whether we should include these... I think we might want to distinguish between the tropes organically linked to within the article test versus the ones people decided were related - I think those might reflect slightly different processes of categorization happening.

So, assuming we don't want to include those, I get the dict: {'Eagleland': ['TruthInTelevision', 'TheBeautiful', 'Utopia', 'ThePromisedLand', 'TheFifties', 'GoodIsOldFashioned', 'TastesLikeDiabetes', 'IncorruptiblePurePureness', 'TheBoorish', 'WretchedHive', 'BloodKnight', 'PointyHairedBoss', 'Jerkass', 'Greed', 'ItsAllAboutMe', 'CondescendingCompassion', 'DeepSouth', 'TheWildWest', 'FatBastard', 'RedScare', 'TheFundamentalist', 'HeteronormativeCrusader', 'GunNut', 'MoralGuardians', 'FastFoodNation', 'GangsterLand', 'RichBitch', 'HollywoodCalifornia', 'TheSocialDarwinist', 'TheScrooge', 'CorruptCorporateExecutive', 'KillThePoor', 'ArsonMurderAndJaywalking', 'GlobalIgnorance', 'Mixed', 'TakeAThirdOption', 'BoisterousBruiser', 'IdiotHero', 'JerkWithAHeartOfGold', 'Trope', 'CulturalCringe', 'BoomerangBigot', 'CreatorProvincialism', 'MemeticMutation', 'WorldOfBadass', 'WretchedHive']}

Thoughts on whether the lists of links should or should not be included, and did we include them on other pages?

jwzimmer-zz commented 3 years ago

Phil's list: everytime any trope in the masterlist links to any other trope in the masterlist anywhere on the page My files: what links are embedded in the text of the article for each trope in the masterlist

jwzimmer-zz commented 3 years ago

Resolved, there is now a linked_tropes_dict matching eagleland.