newtfire / introDH-Hub

shared repo for DIGIT 100: Introduction to Digital Humanities class at Penn State Erie, The Behrend College
https://newtfire.github.io/introDH-Hub/
Creative Commons Zero v1.0 Universal
10 stars 4 forks source link

Mystery Text Discussion: pcc.txt #105

Open ebeshero opened 1 week ago

ebeshero commented 1 week ago

Post your screenshots and discuss your findings about pcc.txt here!

everhagen-23 commented 2 days ago

The most common 3 ngram phrase is "one of the." The other phrases appear to have more of an instructional tone, such as "portion of the" and "in regard to." image

Once you get to the 5 ngram phrases they begin to become phrases more focused on location and time, which were not present previously. The longitude and latitude phrases lead to coordinates specifically. image image

Once you start getting higher and higher number ngrams phrases containing the word "ugh" tend to rise to the top. The words "chapter" also come up a decent bit, but it can be assumed that they refer to a table of contents or something similar. image

Based on these clues I would guess that the text has something to do with maps or geography. It could also possibly be a more general guidebook of some sort.

JSwitkowski commented 1 day ago

Adding onto what Emma mentioned, it seems to be a story about traveling or some sort of journey. Many of the phrases and terms seem to be common story-telling elements. In my mind, latitude and longitude elements reference ocean travel of some form.

Word cloud from Voyant image

nopalm7 commented 1 day ago

Below are the top 25 terms from Voyant using pcc.txt. The top word was "said" which tells me there are characters speaking pretty frequently throughout the text.

Screenshot 2024-10-30 at 12 36 55 PM

This is showing the text with ngrams of size 20. This is the highest I went and there are still frequencies above the size 5. This is because the word "chapter" was repeated a lot for some reason. I'm not sure it this was from the text or an error.

Screenshot 2024-10-30 at 12 34 09 PM

This is showing the ngrams of size 4. With phrases such as "that is to say," "for the purpose of," and "i could not help" it can be inferred that there are the words of characters attempting to explain their reasoning on how/why they want to make certain decisions and likely attempting to get others to be in agreement with them. The characters could be going on some sort of journey where they are forced to make difficult decisions and have lengthy discussions on what decision would produce the most favorable result.

Screenshot 2024-10-30 at 1 09 19 PM

I wanted to search up the word "hero" in KWIC to see if it was some kind of fantasy journey and although there are some results it doesn't seem super relevant to the story.

Screenshot 2024-10-30 at 1 14 57 PM

I instead switched to "her" in KWIC because it had the high frequency of 1184 ngrams. Based on these results I would predict that there are more female characters than male characters in this text.

Screenshot 2024-10-30 at 1 16 53 PM