Open jwzimmer-zz opened 2 years ago
A dataframe with all 800 characters and the fictional universe they are from:
A dataframe with 31 characters from the Lord of the Rings and the adjectives they appear near (using a package to label parts of speech and looking at characters whose name occurs in all 3 books at least 50 times):
First attempt -- running SVD on the LOTR adjectives
lotrdf = pd.read_csv("lotr_adj_df_2021_11_03.csv")
df,u,d,v,sig,x,rex = runSVD(lotrdf,dropcols=["Unnamed: 0"])
cols = lotrdf.columns
cols = cols[1:]
vector_barchart(cols,v[0,:],10)
Then with removing the mean:
dfmean = df.mean().mean()
df = df - dfmean
The mean is very small, removing it makes almost no difference -- the charts look the same.
First row of V^T (highest magnitude words)
Second row of V^T
Third row of V^T
Next steps/ ideas
So this is nlp identification of pos, combined with an exclusion list I made after looking through the top few hundred items.
Second attempt
Overall mean has been removed.
Looking at what characters have the highest magnitudes in the columns of U to see which characters are the most relevant to each dimension:
names = lotrdf["Unnamed: 0"]
names = list(names)
vector_barchart(names,u[:,0],10)
Looking back at original dataset to see how obvious those look for comparison:
names2 = list(character_map["Character display name"])
vector_barchart(names2,U2[:,0],10)
First column of U
Second column of U
Third column of U
At least for dimensions 1 and 2, yeah, those look more obvious: 1 is a cluster of good guys and 2 is a cluster of villains.
Going back to LoTR... First column of U/ row of V^T
Second column of U/ row of V^T
Third column of U/ row of V^T
Fourth column of U/ row of V^T
Fifth column of U/ row of V^T
Sixth column of U/ row of V^T
Seventh column of U/ row of V^T
Eighth column of U/ row of V^T
Ninth column of U/ row of V^T
Tenth column of U/ row of V^T
To get the dataframe that has the information about the storyverses:
character_map, bap_map = pd.read_html("codebook.html")
To get Lord of the Rings in particular:character_map[character_map['Fictional work']=='Lord of the Rings']