newtfire / introDH-Hub

shared repo for DIGIT 100: Introduction to Digital Humanities class at Penn State Erie, The Behrend College
https://newtfire.github.io/introDH-Hub/
Creative Commons Zero v1.0 Universal
8 stars 4 forks source link

Mystery Text Discussion: fc.text #85

Closed ebeshero closed 7 months ago

ebeshero commented 10 months ago

Post your screenshots and discuss your findings about fc.txt here!

03lizchavez commented 10 months ago

What ngram sizes give you frequency counts above 5? ngram size 4,3,2,1 gives frequency counts about 5 What ngram sizes give you frequency counts in the double digits? 4,3,2,1 What phrases get repeated a lot in this text? "be with you on your wedding night" is a phrase repeated 4 times.

I find it cool and little interesting that the drastic change in about of words going from 2 to 3 and the frequency numbers drop tremendously. In the screenshot it's visible in the change and with the word "of the", it's been repeated very frequently but once it's switched to 3 words, it drops by so much more. The highest frequency was 8 after the change to 3 words.

Screenshot 2023-10-31 at 12 39 09 PM Screenshot 2023-10-31 at 12 39 19 PM
SonicSpaceFan025 commented 10 months ago
  1. After setting my N-Gram size to 4, the phrases I found that had a frequency above five were "but I did not," "in the mean time," "I was unable to," "when I thought of," "as he said this," "at the same time," "for the first time," "I felt as if," "I found that the," and "in my power to."
  2. N-Gram sizes 4 and under gave me frequencies that were in the double digits.
  3. Much of the phrases about weddings are repeated frequently throughout this document.
  4. How do I exclude certain words such as "a," "the," pronouns, and other words that were extremely common?
  5. With these two software, I made some interesting discoveries with this document. These discoveries include much of the phrases in AntConc and Voyant having to do with weddings, the word clouds of both software functioning differently, and Voyant illustrating the frequencies of the most common words and phrases of the text. These discoveries all helped me understand the passage more and how often certain words appear throughout the document depending on their frequency. Screenshot 2023-10-31 165232 Screenshot 2023-10-31 165434 Screenshot 2023-10-31 165520 Screenshot 2023-10-31 170318 Screenshot 2023-10-31 171725 Screenshot 2023-10-31 171745
SonicSpaceFan025 commented 10 months ago

@03lizchavez Hi, Elizabeth! I agree with how the smaller the phrase size, the more commonly used words appear more frequently. I find it weird that the most commonly used words like "the," a," "of," "and," and "was," among others, appear the most. I am trying to figure out how I should filter out these words so that I can find the most frequently used terms other than these words.

ebeshero commented 10 months ago

@SonicSpaceFan025 Voyant screens out those common words. But if you are studying n-grams, you need them to help find the most frequently repeating patterns like "out of the" or "I don't". When you don't screen them out, and check their KWIC (Keyword in Context) view on AntConc, you start to see very interesting distinct patterns. I find 3-grams often the best to work with in AntConc.

ebeshero commented 10 months ago

Basically, you want to look at the AntConc and the Voyant views: they show you different kinds of patterns.

josiahr21 commented 10 months ago

Use our corpus analysis tools, Voyant and Antconc, to take a look at frequently used words and ngrams. Try ngrams of varying sizes: 6, 5, 4, 3, 2. What ngram sizes give you frequency counts above 5? Sizes 4,3,2,1 What ngram sizes give you frequency counts in the double digits? 3,4 What phrases get repeated a lot in this text? “Mr Kirwin” , “but not” Choose some ngram clusters of interest and explore them in their KWIC (Keyword in Context) view to scope the words before and after. Think about what you're seeing: what questions or ideas do you have about this text based on what you are seeing? Was the text written intentionally with the selection of these words? Take screen captures of two or three interesting screens of results.

Screen Shot 2023-10-31 at 9 14 13 PM Screen Shot 2023-10-31 at 9 22 18 PM

Write a paragraph or two about your findings. What you have discovered so far about the text you worked with?It appears that larger phrases are less noticeable. All texts are like this, but this one more than any other. The term "man," which appears 131 times in total, is quite startling. Also, the frequency plateous at four occurs when the ngram size is raised beyond five. There is a second plateau in frequency at ngram size of 9 or above.

hsosia1 commented 10 months ago

Post your screenshots and discuss your findings about fc.txt here!

Use our corpus analysis tools, Voyant and Antconc, to take a look at frequently used words and ngrams. Try ngrams of varying sizes: 6, 5, 4, 3, 2. What ngram sizes give you frequency counts above 5? Sizes 4,3,2,1 What ngram sizes give you frequency counts in the double digits? 3,4 What phrases get repeated a lot in this text? “Mr Kirwin” , “but not” Choose some ngram clusters of interest and explore them in their KWIC (Keyword in Context) view to scope the words before and after. Think about what you're seeing: what questions or ideas do you have about this text based on what you are seeing? Was the text written intentionally with the selection of these words? Take screen captures of two or three interesting screens of results.

Screenshot 2023-10-31 at 11 52 43 PM Screenshot 2023-10-31 at 11 53 01 PM

Write a paragraph or two about your findings. What you have discovered so far about the text you worked with?It appears that larger phrases are less noticeable. All texts are like this, but this one more than any other. The term "man," which appears 131 times in total, is quite startling. Also, the frequency plateous at four occurs when the ngram size is raised beyond five. There is a second plateau in frequency at ngram size of 9 or above.

ebeshero commented 10 months ago

@hsosia1 Your text is identical to @josiahr21 's! Please delete it and write your own post.