jwzimmer-zz / tv-tropes

UVM Stat 287 Final Project repo - network of tropes from TV Tropes wiki
MIT License
2 stars 3 forks source link

check random subset of dicts to verify they match each other and the website #12

Closed jwzimmer-zz closed 3 years ago

jwzimmer-zz commented 3 years ago

from issue #8 - before we close the issue, i think we should randomly spot check our results:

jwzimmer-zz commented 3 years ago

Revised plan for sanity checking due to https://github.com/jwzimmer/tv-tropes/issues/13#issuecomment-717299789:

nguyenhphilip commented 3 years ago

Will do this tn!

jwzimmer-zz commented 3 years ago

Problems

jwzimmer-zz commented 3 years ago

(Some I'm checking are fine, e.g. https://github.com/jwzimmer/tv-tropes/blob/main/linked_trope_dict_from_AMasterMakesTheirOwnTools.json vs. https://tvtropes.org/pmwiki/pmwiki.php/Main/AMasterMakesTheirOwnTools.)

jwzimmer-zz commented 3 years ago

I think I fixed the issues I found above ^ in https://github.com/jwzimmer/tv-tropes/commit/2485da74404c4d871747095a6292096d40fffd21 and https://github.com/jwzimmer/tv-tropes/commit/1c0110adba2c0dbf489cc5fcadc6c6c5a31a72a6?

=== Before I re-run my script to get updated dicts: ===

jwzimmer-zz commented 3 years ago

@nguyenhphilip I'll check this issue again later/ tmrw to see what changes I should make based on whatever you find, too. : )

nguyenhphilip commented 3 years ago

Hm, so this is written on the UsefulNotes page:

"Useful Notes articles are not tropes and are not to be included in a work's trope list. See, however, Historical Domain Character. Similarly, tropes are not to be used to describe the subject of a Useful Notes article directly. You may, however, list tropes that are commonly found in media portraying the subject."

The the links on this UsefulNote page GunsOfFiction verify this, it just links to different types of guns like 'revolver' and 'sniper rifle', but there isn't really any content inside of them, which makes me think the purpose of these pages aren't to be meta tropes.

One UsefulNote, MuhammadAli, does link to other tropes that are in Main, but I don't think Muhammad Ali would be a meta trope that ties together these other tropes (could be wrong). Probably ties to this part in above paragraph: Similarly, tropes are not to be used to describe the subject of a Useful Notes article directly.

The World War II UsefulNote links link to other Useful Notes (i.e. 1 and 2 and 3)

So maybe UsefulNotes won't be so useful for us :\

jwzimmer-zz commented 3 years ago

Hmmmm ok, well, that's easy to take back out! : )

nguyenhphilip commented 3 years ago

Made a dict of links from the articles in every subfolder in Indices

Also while spot checking ABoyAGirlAndABabyFamily using my dict and Julia's dict, it looks like Julia's script captures links that are nested in the Example folders that you have to click to expand on their website that mine doesn't. I think this is because J loops through <p> and <li> tags for links, while I use only <div id = main-article>. Some of these links aren't in the main trope_list (AlwaysMale), but some are (UntoUsASonandDaughterAreBorn).

Doing some further digging, it looks like AlwaysMale IS in Main, the folder we used for our initial master trope list. Maybe a better filtering strategy would be to grab all links within <p> and <li> tags, and exclude links neither in Main or the master trope_list.

Everything in Julia's FightSceneFailure dict lines up with what's in mine except for an extra link to FightSceneFailure. Looks like there are some other <li> items on the page that link back to itself?

jwzimmer-zz commented 3 years ago

Cool, super helpful!

So my script is currently doing this "Maybe a better filtering strategy would be to grab all links within <p> and <li> tags, and exclude links not in Main" but it isn't doing the additional step you mentioned, checking against the master trope list. Should I add that? That would screen out things like the "Trope" page, is that the idea?

Allowing self-links and duplicates: I think there are analyses that could care about this, especially the duplicates part (e.g. it would be cool to make a network only showing edges above a certain threshhold, like articles that connected to each other more than 10 times or whatever)... maybe we should keep duplicates but get rid of self-links?

nguyenhphilip commented 3 years ago

Yeah the idea i had in mind was to grab tropes that were in Main but not listed in trope_list so that we can capture as many tropes as we can.

self-links and duplicates: sorry i forgot to think about this last night! yes i agree, i think having duplicates could be interesting depending on the analysis and that we probably don't care about self-links.

side note: Once we do some final filtering/update of scripts I feel like we probably have a large enough sample to begin looking at some questions! :)

jwzimmer-zz commented 3 years ago

ok so i should make the dicts such that: (0) one dict for every trope in the masterlist of tropes, (1) include links to things not on the masterlist, which we can ignore or include as needed, (2) only include links in the Main namespace or in the masterlist of tropes, (3) include duplicates, (4) do not include self-links

jwzimmer-zz commented 3 years ago

Remade in https://github.com/jwzimmer/tv-tropes/commit/026cfc1ebafbbd30187e8924a01f3e8245a7609a such that:

To reiterate from elsewhere: the trope masterlist is all the articles they've labelled as tropes, https://github.com/jwzimmer/tv-tropes/tree/main/trope_list/tropes

i will manually check that these pages https://github.com/jwzimmer/tv-tropes/issues/15 are fine to exclude (if not, I'll add them as new dicts).

jwzimmer-zz commented 3 years ago

@nguyenhphilip these dicts I think look reasonable - not too different from the ones we had before but with the revisions above - so at some point would you mind sanity-checking a few and making sure they look like what you expect? Thanks.

nguyenhphilip commented 3 years ago

QCd random 3 articles in list linked_article_tropes:

I feel confident that Julia's script got the things we wanted!

jwzimmer-zz commented 3 years ago

Great! Thanks, @nguyenhphilip!