jwzimmer-zz / tv-tropes

UVM Stat 287 Final Project repo - network of tropes from TV Tropes wiki
MIT License
2 stars 3 forks source link

Add weights to edges - increment per time linked between - undirected #20

Closed jwzimmer-zz closed 3 years ago

jwzimmer-zz commented 3 years ago

I think it makes sense to count links both in and out of a page for purposes of weight, since the idea here is to say "how tightly associated are these two nodes" - I don't think the directionality makes sense to preserve in this case. (We've preserved directionality in most of our files thus far implicitly; in dictionaries the key is the nodes that links TO the values; in files the filename is the page that contains the links; in edge-lists the first node links TO the second node.)

jwzimmer-zz commented 3 years ago

(I'll do this after I know @nguyenhphilip is done with the masterlist of indices) (Edit: he's done, I just missed his comment about it! : ) )

jwzimmer-zz commented 3 years ago

Using the index dicts made by @nguyenhphilip in https://github.com/jwzimmer/tv-tropes/tree/main/index-list and the tropes in https://github.com/jwzimmer/tv-tropes/tree/main/linked_article_tropes, make an undirected graph where the weight of the edges is incremented by the number of times the pages are linked.

Motivation being that it might make the network more manageable in size using a reasonable metric.

Does it make sense to try to parse the titles for happiness score now, too, or separately? - https://github.com/jwzimmer/tv-tropes/blob/main/Hedonometer.csv... probably should be done separately, as there are a number of potential complications from that that could hold up completing this task, which should be relatively straightforward...

nguyenhphilip commented 3 years ago

I think we should do the NLP stuff separately! I say let's try to get the community detection stuff working first

jwzimmer-zz commented 3 years ago

Resolved by @nguyenhphilip with the centrality measures, which turned out to be a better strategy for narrowing down the network into a tractable size than this