jwzimmer-zz / tv-tropening

1 stars 0 forks source link

Lord of the Rings - part of POCS Final Project #18

Open jwzimmer-zz opened 2 years ago

jwzimmer-zz commented 2 years ago

To get the dataframe that has the information about the storyverses: character_map, bap_map = pd.read_html("codebook.html") To get Lord of the Rings in particular: character_map[character_map['Fictional work']=='Lord of the Rings']

ID Fictional work Character display name
775 LOTR/1 Lord of the Rings Frodo Baggins
776 LOTR/2 Lord of the Rings Aragorn
777 LOTR/3 Lord of the Rings Boromir
778 LOTR/4 Lord of the Rings Merry Brandybuck
779 LOTR/5 Lord of the Rings Samwise Gamgee
780 LOTR/6 Lord of the Rings Gandalf
781 LOTR/7 Lord of the Rings Gimli
782 LOTR/8 Lord of the Rings Legolas
783 LOTR/9 Lord of the Rings Pippin Took
784 LOTR/10 Lord of the Rings Gollum
All the adjectives used in our dataset of traits: adjective_list.csv 0
0 open-minded
1 artistic
2 individualist
3 repetitive
4 indiscreet
5 joyful
6 western
7 street-smart
8 genuine
9 backdoor
10 existentialist
11 decisive
12 cool
13 family-first
14 villainous
15 indulgent
16 patriotic
17 atheist
18 insecure
19 juvenile
20 classical
21 involved
22 reclusive
23 quitter
24 anarchist
25 helpless
26 hipster
27 stick-in-the-mud
28 ivory-tower
29 unassuming
30 hypocritical
31 dunce
32 genius
33 tall
34 crazy
35 scientific
36 dorky
37 unambitious
38 leisurely
39 gossiping
40 scandalous
41 anxious
42 specialist
43 sane
44 ferocious
45 extreme
46 judgemental
47 patient
48 high-tech
49 unlucky
50 plays hard
51 demonic
52 hurried
53 down to earth
54 foolish
55 arrogant
56 deviant
57 studious
58 respectful
59 passive
60 slacker
61 bold
62 mighty
63 disreputable
64 rational
65 regular
66 active
67 edgy
68 blue-collar
69 industrial
70 submissive
71 proper
72 irrelevant
73 chatty
74 liberal
75 charming
76 sheriff
77 precise
78 goof-off
79 orange
80 no-nonsense
81 jock
82 monochrome
83 hard
84 resigned
85 physical
86 equitable
87 reasonable
88 conspiracist
89 methodical
90 fresh
91 funny
92 deranged
93 competent
94 dramatic
95 varied
96 communal
97 confidential
98 warm
99 well behaved
100 mysterious
101 spontaneous
102 creepy
103 pessimistic
104 biased
105 zany
106 flamboyant
107 mild
108 prestigious
109 kinky
110 lenient
111 suspicious
112 sporty
113 cheery
114 aloof
115 philosophical
116 driven
117 salacious
118 bourgeoisie
119 debased
120 hesitant
121 apprentice
122 expressive
123 decorative
124 trash
125 honorable
126 cruel
127 head in clouds
128 avant-garde
129 noob
130 playful
131 political
132 open to new experiences
133 emotional
134 soft
135 jealous
136 heroic
137 reasoned
138 feminist
139 vague
140 democratic
141 domestic
142 scrub
143 queer
144 crafty
145 disorganized
146 selfish
147 flexible
148 tasteful
149 accepting
150 impartial
151 orderly
152 demure
153 penny-pincher
154 close-minded
155 persistent
156 chaste
157 devout
158 creative
159 sensible
160 spicy
161 feminine
162 open
163 proletariat
164 purple
165 pro
166 rich
167 astonishing
168 slugabed
169 eloquent
170 metaphorical
171 wholesome
172 drop out
173 young
174 independent
175 abstract
176 empirical
177 relaxed
178 frugal
179 luddite
180 beta
181 stinky
182 work-first
183 egalitarian
184 depressed
185 master
186 thick-skinned
187 socialist
188 poor
189 monastic
190 low-tech
191 assertive
192 masculine
193 right-brained
194 social
195 strict
196 nihilist
197 obsessed
198 human
199 mathematical
200 explorer
201 serious
202 official
203 shallow
204 heathen
205 cosmopolitan
206 overspender
207 traitorous
208 stylish
209 real
210 mad
211 eastern
212 modern
213 cautious
214 country-bumpkin
215 alpha
216 messy
217 lowbrow
218 tame
219 careful
220 builder
221 straight
222 practical
223 angelic
224 idealist
225 traditional
226 simple
227 weird
228 alert
229 gendered
230 mainstream
231 night owl
232 sheeple
233 compersive
234 vanilla
235 mature
236 intellectual
237 good-humored
238 moderate
239 disarming
240 happy
241 soulless
242 humorless
243 insider
244 extraordinary
245 loyal
246 whimsical
247 literary
248 obedient
249 clumsy
250 resolute
251 awkward
252 confident
253 rude
254 straightforward
255 fast
256 coordinated
257 moody
258 interesting
259 legit
260 skeptical
261 complimentary
262 pacifist
263 conservative
264 cunning
265 barbaric
266 first-mate
267 adventurous
268 tense
269 hoarder
270 optimistic
271 lustful
272 politically correct
273 tactful
274 sexist
275 authoritarian
276 refined
277 objective
278 nerd
279 subjective
280 ugly
281 competitive
282 thick
283 hedonist
284 bitter
285 pack rat
286 wise
287 high IQ
288 calm
289 sober
290 chaotic
291 feisty
292 transient
293 extrovert
294 lewd
295 self-disciplined
296 puny
297 generalist
298 focused on the present
299 mundane
300 captain
301 poisonous
302 neurotypical
303 scholarly
304 brave
305 kind
306 armoured
307 instinctual
308 meek
309 civilized
310 provincial
311 highbrow
312 multicolored
313 miserable
314 outsider
315 rugged
316 fortunate
317 bright
318 scruffy
319 cooperative
320 deep
321 urban
322 modest
323 incompetent
324 spiritual
325 oblivious
326 statist
327 introspective
328 mischievous
329 vengeful
330 bookish
331 roundabout
332 introvert
333 repulsive
334 permanent
335 charismatic
336 reserved
337 imaginative
338 impatient
339 animalistic
340 apathetic
341 forgiving
342 rigid
343 vain
344 thin
345 androgynous
346 unprepared
347 morning lark
348 slovenly
349 technophile
350 private
351 complicated
352 libertarian
353 direct
354 valedictorian
355 tiresome
356 autistic
357 ludicrous
358 lazy
359 short
360 resistant
361 altruistic
362 intimate
363 not introspective
364 pronatalist
365 resourceful
366 treasure
367 humble
368 unpolished
369 rough
370 sickly
371 angry
372 utilitarian
373 vulnerable
374 worldly
375 attractive
376 outlaw
377 workaholic
378 formal
379 unorthodox
380 pure
381 quarrelsome
382 unpatriotic
383 go-getter
384 realist
385 focused on the future
386 sarcastic
387 uninspiring
388 literal
389 emancipated
390 arcane
391 conventional
392 sheltered
393 shy
394 cringeworthy
395 minimalist
396 cold
397 unambiguous
398 manicured
399 diligent
400 guarded
401 soulful
402 smooth
403 logical
404 sad
405 inspiring
406 sorrowful
407 racist
408 wild
409 slothful
410 important
411 healthy
412 stoic
413 glad
414 nonpolitical
415 trusting
416 works hard
417 average
418 child free
419 uncreative
420 self-conscious
421 dominant
422 quiet
423 loud
424 rural
425 theist
426 pretentious
427 deliberate
428 enslaved
429 basic
430 city-slicker
431 neat
432 bossy
433 soft 2
434 lavish
435 concrete
436 gregarious
437 rebellious
438 remote
439 wavering
440 slow
441 impulsive
442 sweet
443 nurturing
444 beautiful
445 low IQ
446 normal
447 left-brained
448 stable
449 historical
450 self-assured
451 innocent
452 old
453 cryptic
454 curious
455 gracious
456 sensitive
457 scheduled
458 theoretical
459 insulting
460 codependent
jwzimmer-zz commented 2 years ago

A dataframe with all 800 characters and the fictional universe they are from:

character_map.csv

AttackPenguin commented 2 years ago

A dataframe with 31 characters from the Lord of the Rings and the adjectives they appear near (using a package to label parts of speech and looking at characters whose name occurs in all 3 books at least 50 times):

lotr_adj_df_2021_11_03.csv

jwzimmer-zz commented 2 years ago

First attempt -- running SVD on the LOTR adjectives

lotrdf = pd.read_csv("lotr_adj_df_2021_11_03.csv")
df,u,d,v,sig,x,rex = runSVD(lotrdf,dropcols=["Unnamed: 0"])
cols = lotrdf.columns
cols = cols[1:]
vector_barchart(cols,v[0,:],10)

Then with removing the mean:

dfmean = df.mean().mean()
df = df - dfmean

The mean is very small, removing it makes almost no difference -- the charts look the same.

First row of V^T (highest magnitude words) image

Second row of V^T image

Third row of V^T image

Next steps/ ideas

AttackPenguin commented 2 years ago

So this is nlp identification of pos, combined with an exclusion list I made after looking through the top few hundred items.

lotr_adj_df_nlp_2021_11_03.csv

jwzimmer-zz commented 2 years ago

Second attempt

Overall mean has been removed.

Looking at what characters have the highest magnitudes in the columns of U to see which characters are the most relevant to each dimension:

names = lotrdf["Unnamed: 0"]
names = list(names)
vector_barchart(names,u[:,0],10)

Looking back at original dataset to see how obvious those look for comparison:

names2 = list(character_map["Character display name"])
vector_barchart(names2,U2[:,0],10)

First column of U image

Second column of U image

Third column of U image

At least for dimensions 1 and 2, yeah, those look more obvious: 1 is a cluster of good guys and 2 is a cluster of villains.

Going back to LoTR... First column of U/ row of V^T image image

Second column of U/ row of V^T image image

Third column of U/ row of V^T image image

Fourth column of U/ row of V^T image image

Fifth column of U/ row of V^T image image

Sixth column of U/ row of V^T image image

Seventh column of U/ row of V^T image image

Eighth column of U/ row of V^T image image

Ninth column of U/ row of V^T image image

Tenth column of U/ row of V^T image image