newtfire / introDH-Hub

shared repo for DIGIT 100: Introduction to Digital Humanities class at Penn State Erie, The Behrend College
https://newtfire.github.io/introDH-Hub/
Creative Commons Zero v1.0 Universal
8 stars 4 forks source link

Mystery Text Discussion: ww.txt #90

Closed ebeshero closed 7 months ago

ebeshero commented 10 months ago

Post your screenshots and discuss your findings about ww.txt here!

VocaloidOtaku39 commented 10 months ago

I chose this science-fiction novel for this text analysis assignment, which is about martians. When I analyzed this text using Voyant, I was surprised that "martians" was the second-most mentioned term in the entire text; "said" was mentioned the most! Then, I looked at the relationships with the terms "said" and "martians" in the Termsberry, where I observed that more related terms show up when I hover to "martians" than when I hovered to "said". Using AntConc, on the other hand, I have found that the NGram size of 3 gives me a lot of words with a frequency count in double digits, whereas an NGram size of 4 gives me many words with a frequency count of more than 5. With an NGram size of 5, "the edge of the pit" is mentioned the most, in which it was written 8 times. Using the KWIC feature in AntConc, I experimented with the terms "the martians", in which I have found 137 phrases that include these terms, most of these sentences that were written about the martians have just done as the story progressed.

Voyant (WW) AntConc (WW) KWIC

OH-ThatGuy commented 10 months ago

After looking through the various texts we were given, I decided to analyze ww.txt. And sorry if this is cheating but after reading through this I am fairly certain this is H.G. Wells War of the Worlds. However, even though I knew what this story was, I tried my best to make sure that my knowledge did not interfere with my analysis. I'll start by showing the simplest thing, being the "word cloud" generated by Voyant. screenshot3 When creating the word cloud I made sure that it excluded the obvious words like "the" and "and", because I didn't think that they would show me anything interesting. When looking at the word cloud the most noticeable words are "martian", "came", and "people". What's interesting about this is that it puts man and martian on equal footing. For a story about aliens visiting Earth, the aliens get just as much attention as the people panicking on the planet. In fact when setting the N-Gram size to 1, the word "martians" was only used 5 more times total compared to the word "people". screenshot2 When setting the N-Gram size to something larger, like 3, the most used phrase becomes "out of the" with a whopping 53 uses! screenshot1 When looking into their uses, the most common use of the phrase "out of the" is not in reference to aliens leaving their ships, but instead people panicking to get away from said aliens or foreign objects. Overall I think what I found most interesting from this reading was the focus on words like "people" and "man". This is very clearly a story about Martians visiting Earth, but to make the story feel more immersive or realistic, the author chose to focus on the people that we can more easily relate to, rather than the aliens.

MystKitteh commented 10 months ago

I wanted to do the ghost stories radio text, but I couldn't get it to open up with AntConc so ww.txt is my next choice! Here's the answers for the questions asked on the assignment: 1, Ngram sizes 1 through 5 give frequency counts of 5 or above.

  1. Ngram sizes 1 through 4 give frequency counts of 10 or more.
  2. These are the top five phrases seen throughout the text out of the(53), there was a(30), of the martians(28), the heat ray(27), of the pit(26)
  3. Out of the - There are a lot of needing to get out of something situations. This makes me curious about what kind of situation the protagonist is put in. Of the pit - Of the pit pops up 26 times, the pit seems as a very important location in the story.

When viewing the text in Voyant Tools I noticed the top five most frequent words are said (166), martians (163), people (159), came (151), and saw (129). These words being the top five make a lot of sense since the story is about Martins. image

Here's a screenshot of "of the pit" being used multiple times throughout the story. image

Rainbow7779 commented 10 months ago

I chose this science fiction text since the genre has always been a fascination to me. I was interested in analyzing the frequency of various phrases throughout the text and "ulla ulla ulla ulla" as well as "of the heat ray" had a frequency of 7 and 8 respectively. N-Gram

When investigating further, I discovered that the phrase "of the heat ray" in context throughout the text had 8 different uses at various points throughout the story! This is interesting because it leads me to believe that heat rays may be a weapon utilized by the martians in the novel. KWIC

Finally, when inserting the text into Voyant, I created a wordcloud of the text in its' entirety. The words "brother," "smoke" and "pit" caught my eye as being frequently used. This text must have themes of family ties and war. A question that I have based on this Voyant analysis is what "pit" could mean? voyantww

KaitlynScutella commented 10 months ago

I analyzed the ww.txt option. This text seems to be about Martians and their time on Earth. When I put the text into Voyant tools, "Martian" is the second most used word throughout the text. "Said" is the first most used word. The phrases "of the Martians", "of the pit", and "of the cylinder" are the 3 most used phrases. The KWIC option is really interesting because you can learn much more about your text, just by a 3-word Ngram. I like using AntConc to learn more about the story because it gives you more insight on what is happening throughout the text without having to read it. Screenshot (1) Screenshot (3) Screenshot (4)

VocaloidOtaku39 commented 10 months ago

@OH-ThatGuy Wow, I am surprised that "martians" were written a little more often than "people", especially for a sci-fi story!

creaturepsu commented 10 months ago

Looking at ww.txt, I found that using ngrams: 2,3,4, mostly returned frequency's above 5. Ngrams of 2-3 typically give frequency counts above the double digits. the phrase "out of the" was repeated the most in the text. If I limit phrases to only those that appear 10+ times, I get 65 results. when looking at "there was a" I start to think this text is some sort of thriller that keeps the reader in the dark.

image

image

image

gylertaydos commented 10 months ago

I chose to continue using this text, even though we went over it in class. I started by continuing to look at the "out of the" n-gram, since I figured that would be a good place to start. I noticed that in the KWIC view they were ordered by what word immediately followed the n-gram. image I thought it was interesting that in a book about Martians attacking (hmmm, I wonder what book that could be?), the highest matches for "out of the" were actually people emerging out of water. Granted, I think Martians (and Martian-related things) still emerge out of things more frequently than people, but still.

image There's a lot of things coming out of pits, which I personally find interesting as a lot of these pits seem to be in reference to fighting or taking cover from the Martians. World War 1 wouldn't occur for quite some time after this text was first written, but I think its interesting to think about that the tactic of digging in and hiding in holes in the ground seemed to be such a valid tactic all the way back then that it was employed against alien invaders.

Switching to Voyant, the longest string was "I was walking down the road to clear my brain." image In a book describing alien invasions, the longest single sentence is a moment were seemingly nothing is happening, just a stroll down the road.

image Not much of a surprise here...

vnichols16 commented 10 months ago

I first chose the ww text because I thought the name would be easy to find once I had some clues. The very first thing I did was slap the text into Voyant.

Screenshot 2023-10-31 162707

I can't help but notice the word "martians" in the very front. Reminding me of a certain classic dystopian novel with "war" and "worlds" in the title.

Next I put the text into Antconc.

Screenshot 2023-10-31 162923

I chose to use a N-Gram size of 4 because I wanted a different clue than "martians". "the heat ray" reminds me a lot of a science fiction novel. In War of the Worlds, the weapon of choice for the martians are heat rays.

Fkhan2027 commented 10 months ago

Screenshot 2023-10-31 204317 Screenshot 2023-10-31 204335 In Antconc all the phrases with a 3 ngram give you a frequency count of above 5 for everything. The more you go up in ngrams the fewer phrases will have a frequency count of above 5. 2 ngrams will give all the phrases a frequency count of double digits and going up in ngrams will give you fewer phrases with a frequency count in the double digits. Antconc shows that “out of the”, “there was a”,” of the Martian”,” the edge of the”, and ”for the most part” are all common phrases.

In Voyant, the words said, Martians, people, came, and saw were the most common words with said being mentioned 3 more times than Martians. Voyant takes out a lot more words than what Antconc does s a lot of phases have the same starting word but the rest of the phrase is different.