newtfire / introDH-Hub

shared repo for DIGIT 100: Introduction to Digital Humanities class at Penn State Erie, The Behrend College
https://newtfire.github.io/introDH-Hub/
Creative Commons Zero v1.0 Universal
8 stars 4 forks source link

Mystery Text discussion of ww.txt #80

Closed ebeshero closed 11 months ago

ebeshero commented 1 year ago

Post your screenshots and discuss your findings about ww.txt here!

rcc5418 commented 1 year ago

I decided to look at War of Worlds with our corpus analysis tools. Immediately noticeable in the word-frequency-graph is the large size of the word 'martian'. Coming in second for most used words, behind the obligatory 'said', it should come as no surprise that 'martian' is used so commonly considering the story. image Taking a look at the text with Antconc, we can view multi-word phrases that are used commonly. Investigating 4 N-gram phrases, some fun sci-fi phrases are apparent. 'the edge of the', 'edge of the pit', and 'of the heat ray' bring to mind the terror of an alien invasion. image The 4 N-gram phrases are also visible in this Wordcloud I made with an image of a War of Worlds toy I found online. image

Cullen-Mort commented 1 year ago

The word Artilleryman was the one that made me realize this was "War of the Worlds" by Jules Verne and I'm not quite sure why. Screenshot 2023-03-16 125209 This shows the top 25 most common words including Martian which makes the origin of it quite apparent. But not for me needed that artilleryman. Screenshot 2023-03-16 125756 I turned the n-gram value to max (25) and when sentences are taken out with no punctuation stuff got a little funky. Screenshot 2023-03-16 130335 The highest n-gram that still has a frequency higher than 1.