newtfire / introDH-Hub

shared repo for DIGIT 100: Introduction to Digital Humanities class at Penn State Erie, The Behrend College
https://newtfire.github.io/introDH-Hub/
Creative Commons Zero v1.0 Universal
8 stars 4 forks source link

Mystery Text Discussion: cpi.txt #99

Open ebeshero opened 5 months ago

ebeshero commented 5 months ago

Post your screenshots and discuss your findings about ww.txt here!

njp5577 commented 5 months ago

From this text document (cpi.txt), frequencies above 5 occur at ngrams sizes 5 and below. Sizes of 3 and below for the ngram give frequencies in the double digits. In terms of common phrases, they are mostly composed of references to locations and characters relevant to the text (Poirot, the Prime Minister, London, and so on).

digittext1

digittext2

Some common phrases are also mixed in there for high frequencies. One thing I found interesting was the common use of "I don't" as well as " I do not". They are close to the same frequencies at ngram size 3. Looking at the direct uses of "I do not", I think that these are intentionally used to portray the level of formality the character is speaking in (whether or not they heavily use contractions while speaking). I am assuming this, because the text samples include a lot of "Mr.", "Miss", "Mrs.", and other more formal descriptors. One the other hand, analysis of how "The Prime Minister" is used seems to indicate that he is being watched or maybe even investigated.

digittext3

digittext4

From the few text snippets that I have seen and the text analysis results, I can piece together that this is some sort of detective story. I wonder what the general plot of the story is.