jwzimmer-zz / tv-tropes

UVM Stat 287 Final Project repo - network of tropes from TV Tropes wiki
MIT License
2 stars 3 forks source link

Codebook/ what are all the things #13

Open jwzimmer-zz opened 3 years ago

jwzimmer-zz commented 3 years ago

It probably is a good idea to keep track of what everything is so we don't forget, like Prof Cheney said this morning.

jwzimmer-zz commented 3 years ago

So far we are using the titles given to the trope pages as unique identifiers of a trope. We're storing them as strings in lists or dicts. We're assuming the filename of the trope article represents that article's title.

(1) https://github.com/jwzimmer/tv-tropes/blob/main/trope_list/tropes/tropes_dict.json: single list comprising a dict for every trope in https://github.com/jwzimmer/tv-tropes/tree/main/trope_list/tropes which has every trope that trope links to on its article page (2) https://github.com/jwzimmer/tv-tropes/tree/main/trope_list/tropes: "master list" of all the tropes... the TV Tropes community made this list of every article that was (I believe manually) tagged as a trope, so we used that as the consensus list of all tropes. There are lots of other articles and things that could be considered tropes on the website. (3) https://github.com/jwzimmer/tv-tropes/tree/main/Indices: a folder with different indices in it - iirc I downloaded these manually to help check that nothing we cared about was missing when we used wget to scrape the site. (4) https://github.com/jwzimmer/tv-tropes/blob/main/indextree.py: script for pulling the names of the tropes listed on each index page (used on https://github.com/jwzimmer/tv-tropes/tree/main/Indices) (5) https://github.com/jwzimmer/tv-tropes/blob/main/individualtropepage.py: script for pulling the names of the tropes linked to within each individual trope page (used on https://github.com/jwzimmer/tv-tropes/tree/main/trope_list/tropes) (6) files starting with linked_trope_dict: a file for each dict for every trope in https://github.com/jwzimmer/tv-tropes/tree/main/trope_list/tropes which has every trope that trope links to on its article page, HOPEFULLY equivalent to (1) (7) Files starting with txt_dict: a file for the tropes listed within an index (from 3).

jwzimmer-zz commented 3 years ago

Turns out (6) is definitely not equivalent to (1), we interpreted these things differently: (1) is everytime any trope in the masterlist links to any other trope in the masterlist anywhere on the page (6) is what links are embedded in the text of the article for each trope in the masterlist

But that might turn out to be good - it complicates sanity checking a little, but allows us to compare what the difference is between what they (community of contributors) explicitly think of as a related trope vs. what they relate the trope to while writing about it.

jwzimmer-zz commented 3 years ago

Relevant to issue #7 too:

ttvtropestructure

jwzimmer-zz commented 3 years ago

Now my script in the pic above also captures links that are in lists in the main article, not just links in the paragraphs.

jwzimmer-zz commented 3 years ago

After more discussion, comparison, etc., we have decided: for our purposes, we're defining "trope" as a page in https://github.com/jwzimmer/tv-tropes/tree/main/trope_list/tropes (this is the list they've identified as being tropes here https://tvtropes.org/pmwiki/pagelist_having_pagetype_in_namespace.php?n=Main&t=trope ... via https://tvtropes.org/pmwiki/pmwiki.php/Administrivia/NotATrope> https://tvtropes.org/pmwiki/pmwiki.php/Main/Trope > https://tvtropes.org/pmwiki/pmwiki.php/Main/Tropes > https://tvtropes.org/pmwiki/pagelist_having_pagetype_in_namespace.php?n=Main&t=trope)

all the pages in that folder, so equivalent masterlist, in: https://github.com/jwzimmer/tv-tropes/blob/main/in_Masterlist.json

all the pages in the Main folder, so containing tropes not in the masterlist, metatropes, indices, and other article types, in: https://github.com/jwzimmer/tv-tropes/blob/main/in_pmwiki_Main.json

jwzimmer-zz commented 3 years ago

Description of the dicts that are the links within each trope article: https://github.com/jwzimmer/tv-tropes/issues/12#issuecomment-718197360

jwzimmer-zz commented 3 years ago

A gml file (for gephi) of the network in the Sister Tropes page (https://tvtropes.org/pmwiki/pmwiki.php/Main/SisterTrope) - there are unweighted, undirected edges between every pair of tropes given as "sister tropes" in the Examples section of the page - only including tropes that are in the trope masterlist: https://github.com/jwzimmer/tv-tropes/blob/main/sistertropes_inmasterlist.gml

(this version has all the links given in the examples section, whether they're tropes from the masterlist or not: https://github.com/jwzimmer/tv-tropes/blob/main/sistertropes.gml)

jwzimmer-zz commented 3 years ago

A gml file (for gephi) of the network in the Super Tropes page (https://tvtropes.org/pmwiki/pmwiki.php/Main/SuperTrope), there are unweighted, undirected edges between a "Super Trope" root node and each example given in the "samples" section of the page, and then from each example their listed subtropes. The edgelist is listed in https://github.com/jwzimmer/tv-tropes/issues/16#issuecomment-718863380. (I included super tropes NOT in the trope master list; I did not include sub tropes that were not in the trope master list) : https://github.com/jwzimmer/tv-tropes/blob/main/supertropes.gml

jwzimmer-zz commented 3 years ago

The folder https://github.com/jwzimmer/tv-tropes/tree/main/Stanford_Neighborhoods has CSV files for the neighborhoods found in https://dhs.stanford.edu/social-media-literacy/tvtropes-pt-2-trope-but-not-troper-communities/

jwzimmer-zz commented 3 years ago

The list of all the tropes (in the masterlist) and the tropes they link to: github.com/jzimmer/tv-tropes/all-tropes-with-links.json

jwzimmer-zz commented 3 years ago

For imperfect answers to questions from the Datasheets for Datasets paper (https://arxiv.org/abs/1803.09010), see https://github.com/jwzimmer/tv-tropening/issues/3#issuecomment-765594960.