newtfire / introDH-Hub

shared repo for DIGIT 100: Introduction to Digital Humanities class at Penn State Erie, The Behrend College
https://newtfire.github.io/introDH-Hub/
Creative Commons Zero v1.0 Universal
8 stars 4 forks source link

fc.txt #92

Closed josiahr21 closed 10 months ago

josiahr21 commented 10 months ago

Use our corpus analysis tools, Voyant and Antconc, to take a look at frequently used words and ngrams. Try ngrams of varying sizes: 6, 5, 4, 3, 2. What ngram sizes give you frequency counts above 5? Sizes 4,3,2,1 What ngram sizes give you frequency counts in the double digits? Sizes 3,4 What phrases get repeated a lot in this text? “Mr Kirwin” , “but not” Choose some ngram clusters of interest and explore them in their KWIC (Keyword in Context) view to scope the words before and after. Think about what you're seeing: what questions or ideas do you have about this text based on what you are seeing? Was the text written intentionally with the selection of these words? Take screen captures of two or three interesting screens of results.

Screen Shot 2023-10-31 at 9 14 13 PM Screen Shot 2023-10-31 at 9 22 18 PM

Write a paragraph or two about your findings. What you have discovered so far about the text you worked with? I found out that the bigger phrases show up less. This is true for any text but especially for this one. It is a little surprising to see the word “man” show up the most times, 131 to be exact. Also, when the ngram size is increased past five, the frequency plateous at four. When the ngram size is at 9 or more, the frequency plateaus at second.