newtfire / introDH-Hub

shared repo for DIGIT 100: Introduction to Digital Humanities class at Penn State Erie, The Behrend College
https://newtfire.github.io/introDH-Hub/
Creative Commons Zero v1.0 Universal
8 stars 4 forks source link

Mystery text discussion of cpi.txt #66

Closed ebeshero closed 1 year ago

ebeshero commented 1 year ago

Post your screenshots and discuss your findings about cpi.txt here!

MinWu859 commented 1 year ago

image1: 屏幕截图(124) image2: 屏幕截图(125) image3: 屏幕截图(126) Through my studying and discovering in AntConc, I found that the larger the N-Gram size, the smaller the word's frequency. When I choose N-Gram size 1, the largest words frequency that appear in the text is "the"; the text has 3060 "the"; which means "the" is the highest frequency words in the whole text. The N-Gram size 5 and 6 have the word frequency that under "5", those words are the low frequency words within the text. The "KWIC" help me to see the lest content and right content of the selected words or phrase. This is very helpful for people who want to manage the word or to quickly find where the specific word or phrase is.

mblankenberg25 commented 1 year ago

Mysterytext2 mysterytext mysterytext3

Frequent phrases:

of the - 330 the prime minister - 42 shook his head - 18 i don't know - 9 the london and scottish bank - 6

In AntConc, I also found that the n-gram size increases, the word's frequency decreases. The words "the" and "of" are used the most in the text. I also noticed that the phrase "the london and scottish bank" is used quite often. I'm assuming that the text has something to do with money. The setting of the story seems to take place in Europe because of the references to the phrase "the prime minister". I am going to guess and say the story is about a bank robbery or detectives trying to backtrack a bank robbery.