Lord of the Rings - part of POCS Final Project

jwzimmer-zz commented 2 years ago

To get the dataframe that has the information about the storyverses: character_map, bap_map = pd.read_html("codebook.html") To get Lord of the Rings in particular: character_map[character_map['Fictional work']=='Lord of the Rings']

	ID	Fictional work	Character display name
775	LOTR/1	Lord of the Rings	Frodo Baggins
776	LOTR/2	Lord of the Rings	Aragorn
777	LOTR/3	Lord of the Rings	Boromir
778	LOTR/4	Lord of the Rings	Merry Brandybuck
779	LOTR/5	Lord of the Rings	Samwise Gamgee
780	LOTR/6	Lord of the Rings	Gandalf
781	LOTR/7	Lord of the Rings	Gimli
782	LOTR/8	Lord of the Rings	Legolas
783	LOTR/9	Lord of the Rings	Pippin Took
784	LOTR/10	Lord of the Rings	Gollum

All the adjectives used in our dataset of traits: adjective_list.csv		0
0	open-minded
1	artistic
2	individualist
3	repetitive
4	indiscreet
5	joyful
6	western
7	street-smart
8	genuine
9	backdoor
10	existentialist
11	decisive
12	cool
13	family-first
14	villainous
15	indulgent
16	patriotic
17	atheist
18	insecure
19	juvenile
20	classical
21	involved
22	reclusive
23	quitter
24	anarchist
25	helpless
26	hipster
27	stick-in-the-mud
28	ivory-tower
29	unassuming
30	hypocritical
31	dunce
32	genius
33	tall
34	crazy
35	scientific
36	dorky
37	unambitious
38	leisurely
39	gossiping
40	scandalous
41	anxious
42	specialist
43	sane
44	ferocious
45	extreme
46	judgemental
47	patient
48	high-tech
49	unlucky
50	plays hard
51	demonic
52	hurried
53	down to earth
54	foolish
55	arrogant
56	deviant
57	studious
58	respectful
59	passive
60	slacker
61	bold
62	mighty
63	disreputable
64	rational
65	regular
66	active
67	edgy
68	blue-collar
69	industrial
70	submissive
71	proper
72	irrelevant
73	chatty
74	liberal
75	charming
76	sheriff
77	precise
78	goof-off
79	orange
80	no-nonsense
81	jock
82	monochrome
83	hard
84	resigned
85	physical
86	equitable
87	reasonable
88	conspiracist
89	methodical
90	fresh
91	funny
92	deranged
93	competent
94	dramatic
95	varied
96	communal
97	confidential
98	warm
99	well behaved
100	mysterious
101	spontaneous
102	creepy
103	pessimistic
104	biased
105	zany
106	flamboyant
107	mild
108	prestigious
109	kinky
110	lenient
111	suspicious
112	sporty
113	cheery
114	aloof
115	philosophical
116	driven
117	salacious
118	bourgeoisie
119	debased
120	hesitant
121	apprentice
122	expressive
123	decorative
124	trash
125	honorable
126	cruel
127	head in clouds
128	avant-garde
129	noob
130	playful
131	political
132	open to new experiences
133	emotional
134	soft
135	jealous
136	heroic
137	reasoned
138	feminist
139	vague
140	democratic
141	domestic
142	scrub
143	queer
144	crafty
145	disorganized
146	selfish
147	flexible
148	tasteful
149	accepting
150	impartial
151	orderly
152	demure
153	penny-pincher
154	close-minded
155	persistent
156	chaste
157	devout
158	creative
159	sensible
160	spicy
161	feminine
162	open
163	proletariat
164	purple
165	pro
166	rich
167	astonishing
168	slugabed
169	eloquent
170	metaphorical
171	wholesome
172	drop out
173	young
174	independent
175	abstract
176	empirical
177	relaxed
178	frugal
179	luddite
180	beta
181	stinky
182	work-first
183	egalitarian
184	depressed
185	master
186	thick-skinned
187	socialist
188	poor
189	monastic
190	low-tech
191	assertive
192	masculine
193	right-brained
194	social
195	strict
196	nihilist
197	obsessed
198	human
199	mathematical
200	explorer
201	serious
202	official
203	shallow
204	heathen
205	cosmopolitan
206	overspender
207	traitorous
208	stylish
209	real
210	mad
211	eastern
212	modern
213	cautious
214	country-bumpkin
215	alpha
216	messy
217	lowbrow
218	tame
219	careful
220	builder
221	straight
222	practical
223	angelic
224	idealist
225	traditional
226	simple
227	weird
228	alert
229	gendered
230	mainstream
231	night owl
232	sheeple
233	compersive
234	vanilla
235	mature
236	intellectual
237	good-humored
238	moderate
239	disarming
240	happy
241	soulless
242	humorless
243	insider
244	extraordinary
245	loyal
246	whimsical
247	literary
248	obedient
249	clumsy
250	resolute
251	awkward
252	confident
253	rude
254	straightforward
255	fast
256	coordinated
257	moody
258	interesting
259	legit
260	skeptical
261	complimentary
262	pacifist
263	conservative
264	cunning
265	barbaric
266	first-mate
267	adventurous
268	tense
269	hoarder
270	optimistic
271	lustful
272	politically correct
273	tactful
274	sexist
275	authoritarian
276	refined
277	objective
278	nerd
279	subjective
280	ugly
281	competitive
282	thick
283	hedonist
284	bitter
285	pack rat
286	wise
287	high IQ
288	calm
289	sober
290	chaotic
291	feisty
292	transient
293	extrovert
294	lewd
295	self-disciplined
296	puny
297	generalist
298	focused on the present
299	mundane
300	captain
301	poisonous
302	neurotypical
303	scholarly
304	brave
305	kind
306	armoured
307	instinctual
308	meek
309	civilized
310	provincial
311	highbrow
312	multicolored
313	miserable
314	outsider
315	rugged
316	fortunate
317	bright
318	scruffy
319	cooperative
320	deep
321	urban
322	modest
323	incompetent
324	spiritual
325	oblivious
326	statist
327	introspective
328	mischievous
329	vengeful
330	bookish
331	roundabout
332	introvert
333	repulsive
334	permanent
335	charismatic
336	reserved
337	imaginative
338	impatient
339	animalistic
340	apathetic
341	forgiving
342	rigid
343	vain
344	thin
345	androgynous
346	unprepared
347	morning lark
348	slovenly
349	technophile
350	private
351	complicated
352	libertarian
353	direct
354	valedictorian
355	tiresome
356	autistic
357	ludicrous
358	lazy
359	short
360	resistant
361	altruistic
362	intimate
363	not introspective
364	pronatalist
365	resourceful
366	treasure
367	humble
368	unpolished
369	rough
370	sickly
371	angry
372	utilitarian
373	vulnerable
374	worldly
375	attractive
376	outlaw
377	workaholic
378	formal
379	unorthodox
380	pure
381	quarrelsome
382	unpatriotic
383	go-getter
384	realist
385	focused on the future
386	sarcastic
387	uninspiring
388	literal
389	emancipated
390	arcane
391	conventional
392	sheltered
393	shy
394	cringeworthy
395	minimalist
396	cold
397	unambiguous
398	manicured
399	diligent
400	guarded
401	soulful
402	smooth
403	logical
404	sad
405	inspiring
406	sorrowful
407	racist
408	wild
409	slothful
410	important
411	healthy
412	stoic
413	glad
414	nonpolitical
415	trusting
416	works hard
417	average
418	child free
419	uncreative
420	self-conscious
421	dominant
422	quiet
423	loud
424	rural
425	theist
426	pretentious
427	deliberate
428	enslaved
429	basic
430	city-slicker
431	neat
432	bossy
433	soft 2
434	lavish
435	concrete
436	gregarious
437	rebellious
438	remote
439	wavering
440	slow
441	impulsive
442	sweet
443	nurturing
444	beautiful
445	low IQ
446	normal
447	left-brained
448	stable
449	historical
450	self-assured
451	innocent
452	old
453	cryptic
454	curious
455	gracious
456	sensitive
457	scheduled
458	theoretical
459	insulting
460	codependent

jwzimmer-zz commented 2 years ago

A dataframe with all 800 characters and the fictional universe they are from:

character_map.csv

AttackPenguin commented 2 years ago

A dataframe with 31 characters from the Lord of the Rings and the adjectives they appear near (using a package to label parts of speech and looking at characters whose name occurs in all 3 books at least 50 times):

lotr_adj_df_2021_11_03.csv

jwzimmer-zz commented 2 years ago

First attempt -- running SVD on the LOTR adjectives

lotrdf = pd.read_csv("lotr_adj_df_2021_11_03.csv")
df,u,d,v,sig,x,rex = runSVD(lotrdf,dropcols=["Unnamed: 0"])
cols = lotrdf.columns
cols = cols[1:]
vector_barchart(cols,v[0,:],10)

Then with removing the mean:

dfmean = df.mean().mean()
df = df - dfmean

The mean is very small, removing it makes almost no difference -- the charts look the same.

First row of V^T (highest magnitude words)

Second row of V^T

Third row of V^T

Next steps/ ideas

Maybe we should try removing some words
And/ or looking at how the characters are grouped rather than how the words are grouped
Maybe even play around with the function determining the distance-based scores...
Maybe we want to remove the mean of the non-zero scores (more like the "theoretical mean"), not the overall mean

AttackPenguin commented 2 years ago

So this is nlp identification of pos, combined with an exclusion list I made after looking through the top few hundred items.

lotr_adj_df_nlp_2021_11_03.csv

jwzimmer-zz commented 2 years ago

Second attempt

Overall mean has been removed.

Looking at what characters have the highest magnitudes in the columns of U to see which characters are the most relevant to each dimension:

names = lotrdf["Unnamed: 0"]
names = list(names)
vector_barchart(names,u[:,0],10)

Looking back at original dataset to see how obvious those look for comparison:

names2 = list(character_map["Character display name"])
vector_barchart(names2,U2[:,0],10)

First column of U

Second column of U

Third column of U

At least for dimensions 1 and 2, yeah, those look more obvious: 1 is a cluster of good guys and 2 is a cluster of villains.

Going back to LoTR... First column of U/ row of V^T

Second column of U/ row of V^T

Third column of U/ row of V^T

Fourth column of U/ row of V^T

Fifth column of U/ row of V^T

Sixth column of U/ row of V^T

Seventh column of U/ row of V^T

Eighth column of U/ row of V^T

Ninth column of U/ row of V^T

Tenth column of U/ row of V^T

jwzimmer-zz / tv-tropening

Lord of the Rings - part of POCS Final Project #18