newtfire / introDH-Hub

shared repo for DIGIT 100: Introduction to Digital Humanities class at Penn State Erie, The Behrend College
https://newtfire.github.io/introDH-Hub/
Creative Commons Zero v1.0 Universal
8 stars 4 forks source link

Mystery Text Discussion of jcb.txt #52

Closed ebeshero closed 11 months ago

ebeshero commented 2 years ago

Post your screenshots and findings about jcb.txt here!

ipm5130 commented 2 years ago

2022-03-22 2022-03-22 (1) I would like to start with how incredibly interesting the voyant tool made a simple article look. In all seriousness, this was fun to look at and explore. The ngram sizes give me frequency counts above 5 were every number below five, what I mean is when I put the ngram on 4 it gives me freq. like 10 and 12. What's more interesting is when I put the ngram to size 3 it gives me a freq. of 109 at the highest, the phrase was I could not. I couldn't get much of the story, when I was reading I was trying to figure out what time does the story takes place. The way characters interacted you could tell they were children. Often they would say I could not, I can't, I don't, I am not, etc. as if they are defending themselves.

TommyMC2 commented 1 year ago

I chose the text jcb.txt:

An n-gram size of 5 has one phrase with more than six uses, and that's "as well as i could," but with a size of 4, the highest is 12 with "in the course of."

"I could not" is repeated 109 times, and the next highest after that is "i did not" with 60

"Using KWIC on a few of the n-grams, I think the text is kind of dreary or at least solum a lot of the context of  "as well as I could" and "I could not" seems to be in that tone. The most common context to the "I could not" is the character discussing the problems they have to "bear" or endure.

I didn't find any references to violence, which leads me to believe this is more of a drama than I thought it might be at first, which was some fantasy adventure. I also respect the lack of repetition. Once you get past three-word phrases, repetition is very low, with the most repeated four-word phrase being "in the course of" 12 times. This text has 187,462 total words, and the most repeated one is Mr which I find funny because it demonstrates how much talking is done. Ant1 Ant2 Ant3 Ant4 Ant5 Screenshot 2023-03-16 171318