teusbenschop / shona

The text of the Shona Bible for use by the translation team
10 stars 5 forks source link

A counted word list to assist with proof reading the text #19

Open DavidHaslam opened 6 years ago

DavidHaslam commented 6 years ago

The attached Zip file contains a tab delimited counted word list for the text of the Shona translation. The analysis was done by means of a bespoke TextPipe filter.

In removing punctuation, special provision was made to preserve the hyphenated words. The final count duplicate lines filter also sorted the words and is case-sensitive.

This may assist with proof reading the text. Browse the file to look for any mis-spelled words.

The data analysed is from verse and paragraph text, but excludes section headings. Cross-references and all USFM tags were first removed.

merged.words.count.txt.zip

DavidHaslam commented 6 years ago

The analysis can be readily repeated in the future, should the need arise after further corrections.

teusbenschop commented 6 years ago

This is helpful indeed. Also for the spelling of certain names.

DavidHaslam commented 6 years ago

Updated after your recent commits.

merged.words.count.txt.zip

DavidHaslam commented 6 years ago

Here is a derived file in which the third tab field has the words reversed. When opened with Excel the data can be sorted on column C to get the words in column B in rhyming order. i.e. Words with similar endings are found together. This technique can sometimes be fruitful in finding further anomalies in spellings.

merged.words.count.rev.txt.zip

DavidHaslam commented 6 years ago

Following the merge that Removed all cross-references, I just reran the filters to generate the counted words list.

merged.words.count.rev.txt.zip