rllc / llc-archives

Archived sermons, backed by the cloud
https://llc-archives.herokuapp.com
MIT License
1 stars 1 forks source link

Auto-Correct bible text #2

Closed smcadams86 closed 8 years ago

smcadams86 commented 9 years ago

Situation

When webcasters are exporting audio, they hand type the bible text. This is error prone, leading to misspelled bible text and inconsistent abbreviations being displayed on the public facing archives. The spreadsheet-updater is downstream from the actual MP3 creation, as such it has no control over source data.

Proposed Solution

When parsing bible text from the MP3 tag, compare it to a master list of bible text; pick the bible text that is the most similar. There are various algorithms for finding string similarity. See Grails' implementation of CosineSimilarity for an example.

Bonus Points

Extra nice if the master list is maintainable by LLC

Concerns

This is more complicated than minister names, as abbreviations are in play. The ideal solution would create a mapping of all expected variants of books to their preferred convention then use the mapping to resolve the true value.

Example Mapping

[
    'Matthew' : 'Matt.',
    'St Matthew' : 'Matt.',
    'Matt' : 'Matt.',
    'Ezekiel' : 'Ezek.',
    'Ezek' : 'Ezek.',
    'Ez' : 'Ezek.'   
]
wforstie commented 8 years ago

Here are the preferred abbreviations from the LLC. BibleBookAbbreviations.docx

smcadams86 commented 8 years ago

The minister name autocorrection didn't go over too well. I don't expect this would be much easier to implement. This issue does not provide a lot of benefit for the amount of work required.