Closed DavidHaslam closed 7 years ago
If you filter on column B or C by colour, to display 99 of the 979 records, the total number of badly formed glyphs can be added by selecting the count range in column A. The sum comes to 343.
The total number of glyphs is 1,725,674. Thus the error rate is 199 PPM.
Only 5 of the records are common to the bad glyphs and those that change due to NFC normalization.
The difficulty proofreaders face in detecting these badly formed glyphs should not be underestimated.
It's quite likely that the editing environment does not clearly display the letter placeholder dotted circle for any vowel or other sign that is either wrongly attached to a valid glyph or completely unattached.
Row 666 of the worksheet is peculiar. I have formated cell E666 with yellow fill. This badly formed glyph is ਲੰਿ LETTER LA TIPPI VOWEL SIGN I
The signs are in the reverse order! They should be swapped to become valid, thus: ਲਿੰ LETTER LA VOWEL SIGN I TIPPI
This occurs just once in Zephaniah 3:19 which reads:
\v 19 ਵੇਖੋ, ਮੈਂ ਉਸ ਸਮੇਂ ਤੇਰੇ ਸਭ ਦੁਖ ਦੇਣ ਵਾਲਿਆਂ ਨਾਲ ਨਜਿੱਠਾਂਗਾ, ਮੈਂ ਲੰਿਙਆਂ ਨੂੰ ਬਚਾਵਾਂਗਾ, ਅਤੇ ਹੱਕੇ ਹੋਇਆਂ ਨੂੰ ਇੱਕਠਾ ਕਰਾਂਗਾ, ਅਤੇ ਮੈਂ ਸਾਰੀ ਧਰਤੀ ਵਿੱਚ ਓਹਨਾਂ ਦੀ ਸ਼ਰਮ ਉਸਤਤ ਅਤੇ ਜਸ ਬਣਾਵਾਂਗਾ |
The word with the bad glyph is ਲੰਿਙਆਂ.
Even if this is "corrected" by swapping the two diacritic signs, the word still doesn't get translated properly by Google, so it's more likely that a better solution needs to be found here. A missing letter, maybe?
This example also illustrates what I already observed above, that my systematic analysis has detected something that my earlier manual searches failed to find.
NB. My counted glyphs filter cannot in principle detect the occurrence of a duplicated Gurkukhi vowel letter. Those occurrences reported earlier were found by manual searches.
Note that this analysis can be readily repeated once the related issues have been fixed. This will serve as a confirmation test for closing those issues.
Superseded by more recent analysis.
Though I've already referred to this in two other issues, and even though this covers some of the other particular issues, it's probably useful to give this its own issue as a general topic.
I've just updated my Excel worksheet to include Column E for the Unicode Names of the original Gurmukhi Unicode codepoints in each counted glyph. In addition, I have formatted in red font the names of the invalid parts of the 99 glyphs that break the rules for the Gurmukhi script as an Abugida.
Gurmukhi Glyphs Before & After NFC.xlsx
It's conceivable that some of the 99 badly formed glyph types were not reported in my earlier issues. This report therefore serves as a checklist or reference point for search and replace operations.
NB. The worksheet is protected (with no password) merely to prevent accidental edits. Use of AutoFilter is permitted while it's protected.