Open DavidHaslam opened 7 years ago
I was surprised to see the digit 1
at the top of the list. Not yet located this.
Technical tip: Although the hyphen/minus is a punctuation mark, the hyphen U+2010
is not.
To generate the words list, I temporarily replaced the hyphen/minus by the hyphen, so that these were not removed when I used the PCRE class [[:punct:]]
in a subsequent search/replace pattern.
A similar method was used to retain the sole apostrophe.
Windows users may like to know that I generated the counted words list by means of a bespoke TextPipe filter.
A further contrast with modern editions. The KJV of 1611 had a greater number of common nouns capitalised. Here's one such example:
00079 temple
00255 Temple
00010 temples
00003 Temples
Contrast this with a modern KJV (albeit without the DC books):
000204 temple
000009 temples
But here's the same for just the DC books:
000131 temple
000003 temples
The noun temple
never occurs at the start of a sentence, so it was a useful candidate for the comparison.
The counted words list can be pasted into Excel™ and filtered on the Count column to browse through [say] all the hapax legomena.
NB. Take care after pasting. The words false
and true
will have become Booleans.
It's already become apparent that there are instances where two words were joined together in the HTML.
We can begin to list these here:
beconfounded
himcontinually
intheir
Lordcame
maydeclare
mineenemies
ofsorrows
preciousstones
shallbreake
shalloffer
shallsay
thedeepe
thereofshall
withthe
This is different to merely observing varieties of spellings in Early Modern English. Of course, it's conceivable that this exercise will just uncover printers' mistakes from 1611.
Even in a culture where spellings had not been standardised, it's still possible to observe some obvious printers' mistakes.
Beeer-sheba
breehren
chldren
fifteeene
Ind
looosed
monrning
Is it fair to say that a triple vowel is more likely to be a printer's mistake than a spelling variation?
Extend the list as more come to light.
To modern readers, surely the most surprising hapax legomenon in the whole KJV_1611 must be the word Jesus
?
Upon closer inspection, this appears to be due to an HTML transcription error!
[The Prologue of the Wisdome of Jesus the sonne of Sirach]
should read:
[The Prologue of the Wisdome of Iesus the sonne of Sirach]
It appears that the transcribers were unaware that in italics the capital letter I looked as it did. Agreed? @lb42
btw. The prologue titles should be styled in the TEI with italics.
By all means list these obvious typos somewhere. If you provide a reference to show where they occur that would be helpful. Even more so if you can indicate whether it is a printing error (i.e. present in the original source) or a transcription error (i.e. not present in the original source).
For example: Lordcame Jonah 3.1 fix beconfounded Psa 69.6 fix Beeer-sheba Gen 21.32 sic breehren Acts 15.33 sic etc.
That's what I intended, and it'll be tab-delimited for readability.
It's a three stage process. The first stage is complete. I browsed through all the words with Count=1. Now to locate them. After that to examine the page images. May take a while.
You can locate them very quickly using "grep" or whatever the windows equivalent is!
Well, it's easy enough just with having the concatenated [xml] file open in Notepad++ to search for anything. The search/replace UI is very powerful.
In the original 1611 text of the KJV, there were only 12 words that contained the letter J.
All of them had the ij
digraph.
Abijah
Abijam
Ahijah
Aijalon
Aijeleth
Baijth
Elijah
Hodijah
Iehouah-ijreh
Irijah
Tobijah
Urijah
The hyphenated one in Genesis 22:14 was the real surprise. Modern editions have "Jehovah-jireh".
See also issue #2
NB. I have excluded the word Jesus
that was transcribed in the second Prologue in Ecclesiasticus.
FIO: The attached Zip file contains a counted words list for the verse text only, excluding notes.
merged.vpl.words.count.txt.zip
It may be of use for proof reading, etc.
Notes: