newtfire / introDH-Hub

shared repo for DIGIT 100: Introduction to Digital Humanities class at Penn State Erie, The Behrend College
https://newtfire.github.io/introDH-Hub/
Creative Commons Zero v1.0 Universal
8 stars 4 forks source link

fc.txt #93

Closed hsosia1 closed 10 months ago

hsosia1 commented 10 months ago

Use our corpus analysis tools, Voyant and Antconc, to take a look at frequently used words and ngrams. Try ngrams of varying sizes: 6, 5, 4, 3, 2. What ngram sizes give you frequency counts above 5? Sizes 4,3,2,1 What ngram sizes give you frequency counts in the double digits? 3,4 What phrases get repeated a lot in this text? “Mr Kirwin” , “but not” Choose some ngram clusters of interest and explore them in their KWIC (Keyword in Context) view to scope the words before and after. Think about what you're seeing: what questions or ideas do you have about this text based on what you are seeing? Was the text written intentionally with the selection of these words? Take screen captures of two or three interesting screens of results.

Write a paragraph or two about your findings. What you have discovered so far about the text you worked with?It appears that larger phrases are less noticeable. All texts are like this, but this one more than any other. The term "man," which appears 131 times in total, is quite startling. Also, the frequency plateous at four occurs when the ngram size is raised beyond five. There is a second plateau in frequency at ngram size of 9 or above.