newtfire / introDH-Hub

shared repo for DIGIT 100: Introduction to Digital Humanities class at Penn State Erie, The Behrend College
https://newtfire.github.io/introDH-Hub/
Creative Commons Zero v1.0 Universal
8 stars 4 forks source link

Mystery Text Discussion: mb.txt #88

Closed ebeshero closed 7 months ago

ebeshero commented 10 months ago

Post your screenshots and discuss your findings about mb.txt here!

GKon26 commented 10 months ago

While working with AntConc, I came across many references to whales and eventually the name Ishmael. After that, I knew exactly what text I was working with. I had a sneaking suspicion with all the whale talk. Whale is the first "non-filler" word at 22nd most popular word with 1228 results. Ironically, Moby Dick only appears 83 times, being the 102nd most popular 2gram. Granted, most of the ones before it aren't really much of anything. But "the whale" does show up 440 times so maybe they're not all "nothing" 2grams. image image

alissongossage commented 10 months ago

N-gram sizes 1,2,3,4 and 5 give me frequencys above 5. N-gram sizes 1,2,3,4 and 5 gives me frequency counts in the double digits. “the sperm whale” gets repeated 116 times and “of the whale” gets repeated 110 times. Phrases that occur 5-10 times are “out of site of land” and “the bottom of the sea. The higher the n-gram size, the lower the frequency. Basic words like “the” and “of” are the most frequent so they show up larger on voyant.

Screenshot 2023-11-01 at 12 18 36 AM Screenshot 2023-11-01 at 12 13 00 AM Screenshot 2023-11-01 at 12 12 49 AM