lb42 / KJV_1611

A TEI-Conformant version of the 1611 text of the Bible
7 stars 0 forks source link

Character counts (concatenated XML files) FIO #2

Open DavidHaslam opened 6 years ago

DavidHaslam commented 6 years ago

FIO. The attached text file is a character frequency count for the 1361 xml files concatenated from the chap folder (NB. Analysis now includes Romans, and excludes bogus copyright lines.)

merged.xml.character.frequency.txt

The XML entity & occurs 2091 times. There are no other entities.

Of particular interest are the non-ASCII letters and characters:

U+00B6  ¶   2,977   PILCROW SIGN
U+00C6  Æ   1   LATIN CAPITAL LETTER AE
U+00E6  æ   7   LATIN SMALL LETTER AE
U+00FE  þ   204 LATIN SMALL LETTER THORN
U+0101  ā   5   LATIN SMALL LETTER A WITH MACRON
U+0113  ē   36  LATIN SMALL LETTER E WITH MACRON
U+014D  ō   153 LATIN SMALL LETTER O WITH MACRON
U+016B  ū   6   LATIN SMALL LETTER U WITH MACRON

It's evident that the source web-site must not have made any systematic attempt to use the following letter that was in the original KJV of 1611.

U+017F  ſ   LATIN SMALL LETTER LONG S

Reverse engineering a fix for this discrepancy would not be a simple task. Even so, the long s might only have been present in the translators' added words that were styled with Roman typeface; and also the chapter descriptions in head elements and the page titles in fw elements.

cf. The main text of the KJV was in blackletter typeface.

DavidHaslam commented 6 years ago

Aside: Even in modern editions, the last Pilcrow sign in the KJV occurs in Acts 20:36. It's conjectured that the 1611 printers simply ran out of moveable type for this character, and that all subsequent editions simply followed suit.

btw. Modern editions are largely descendants of Benjamin Blayney's 1769 Oxford University Press Edition, albeit with minor textual differences for those published by the Cambridge University Press.

DavidHaslam commented 6 years ago

All 204 instances of the letter thorn are in these two words:

The same English words spelled without the thorn are far more numerous. An intriguing inconsistency.

lb42 commented 6 years ago

As elsewhere, the characters you get in the XML are the characters in the HTML source I used. These are for the most part pretty faithful to the KJV 1611 source, judging by the page images provided, but occasional spots of roman within the black letter (see e.g. Heb 7.20) and the long-s glyph variant don't seem to have been systematically recorded.

DavidHaslam commented 6 years ago

Heb 7.20 is a nice example of roman text with a word containing the long s, namely Prieſt, as is the next verse that has Prieſts. The latter is an interesting example of where the 1611 edition has it as an added word in Roman typeface, yet the modern editions do not have the word Priests styled in italics.

btw. Modern editions do not have the word priest (singular or plural) capitalised here either.

I wonder when these changes were made, and whether they were noted by F H A Scrivener?

DavidHaslam commented 6 years ago

btw. Another surprise was the change of spelling from othe to oath in the space of two consecutive verses!

DavidHaslam commented 6 years ago

Interesting to observe that none of the possessives in 1611 ended with apostrophe & letter s (or vice versa).

In fact the sole apostrophe (\x22) occurs in the word wing'd found in Ezekiel 17:3, thus:

<ab n="3">And say, Thus saith the Lord God, A great eagle with great wings, long wing'd, full of feathers, which had diuers colours, camevnto Lebanon, and tooke the highest branch of the Cedar.<note> Hebr. embroydering.</note></ab>
DavidHaslam commented 6 years ago

Refer to https://en.wikipedia.org/wiki/Apostrophe#Typographic_form

Should we replace the typewriter apostrophe by the single right quotation mark U+2019 ?

Here's what the verse looks like:

screenshot 2017-10-31 14 55 34

@lb42

Aside: At least the transcribers didn't have greateagle or offeathers !