newtfire / introDH-Hub

shared repo for DIGIT 100: Introduction to Digital Humanities class at Penn State Erie, The Behrend College
https://newtfire.github.io/introDH-Hub/
Creative Commons Zero v1.0 Universal
8 stars 4 forks source link

Mystery Text Discussion of ww.txt #53

Closed ebeshero closed 1 year ago

ebeshero commented 2 years ago

Post your screenshots and discuss your findings about ww.txt here!

jms9354 commented 2 years ago

I chose this text, what I discovered is between using AntConc & Voyant tools was on AntConc the word "Martians" appeared 164 time, to where in voyant tools it appeared 163. I copy and pasted the exact same not sure why it was different. Also fascinating because I have no idea what the Mystery text is, but based off this I assume it has to do with martians or aliens of some sort. On AntConc the most common phrase was, "On the edge of the pit" which it appeared 4 times. To where on Voyant tools the most common phrase was, "amid the". Which appeared 11 times. I found this interesting because it shows each tool interprets the reading differently. I personally like different tools or views from Ant Conc & Voyant Tools, I feel as though Ant Conc is a little easier to maneuver to where Voyant tools is more vibrant and eye appealing with graphs and Cirrus. I can't be for sure, but my guess would be the mystery text has to do with some type of invasion of aliens. It kept talking about a "pit" which appeared 83 times, and I've never heard of anything with a "pit" that turned out to be good. It also talked about "night" a lot which appeared 102 times. Usually with mystery or horror books and movies there's a big play of nighttime.

When I limited the results that only occur more than 5 or 10 times, The list got a lot smaller WW txtNGramSize2 TheComingOfTheMartians.txt WW txtNGramSize3 WW txtNGramSize4 WW txtNgramSize5 WW txtNGramSize6 WW txtVoyant .

JaxAbele commented 2 years ago

There were a lot of interesting trends i noticed when looking at this text. As the n-gram gets higher the frequency of words and phrases becomes much smaller. at n-gram 5 there was only one frequency that was above 5. The frequencies became more common the smaller the n-gram got. Frequency counts in the double digits start at n-gram 4 and continue down to n-gram 1. There are no double digit frequencies in n-grams 5 and 6. The phrases "of the martians", "the heat ray", and "of the pit" are repeated several times throughout the text. I noticed that the word "said" is repeated the most during the passage which indicates to me that the plot could be driven by dialogue. The phrases "of the martians" and "the heat ray" being repeated so many times definitely indicates to me that this is a work of science fiction. voyant ngram5 ngram4

ghost commented 2 years ago

The most interesting thing that I found when doing the analysis of this text is when the N-Gram Size was 3 there were two phrases the top of list, "a multitude of" and "a mass of". These two phrases imply a that a noun would follow and these would modify the quantity, however when the N-Gram size is 4 the phrases don't appear at all meaning that they all have different nouns that they are modifying(except "mass of red weed" appearing twice). Quatifiy Length of 4 The word martians is also the second most used word in the text. This must mean that the text is about martians in some way. WordCount