tfbf / Bible-Punjabi-Pavitr-Bible-1945

Bible-Punjabi-Pavitr-Bible-1945
Other
5 stars 9 forks source link

Order of diacritic signs: NUKTA before or after VOWEL ? #89

Open DavidHaslam opened 7 years ago

DavidHaslam commented 7 years ago

In most cases, where a Gurmukhi glyph contains a NUKTA sign, the vowel sign (if present) comes after the NUKTA.

There are a two instances where the vowel comes before the NUKTA. image image

Unicode Normalization does not change the order of these diacritics.

Even so, would it be sensible to change these two exceptions to have the NUKTA immediately after the letter? This would ensure that they should not fall outside a related search due to being in the peculiar order. NB. Such a change does alter the appearance of the glyphs. The NUKTA dot gets moved leftwards.

image image

Though there are no instances of the former, there are already instances of the latter elsewhere in the text: image

Further observations:

DavidHaslam commented 7 years ago

The two exceptions are located as follows:

Psalm 42:9 which reads:

\v 9 ਪਰਮੇਸ਼ੁਰ ਨੂੰ ਜੋ ਮੇਰੀ ਚਟਾਨ ਹੈ ਮੈ ਆਖਾਂਗਾ , ਤੂੰ ਮੈਨੂੰ ਕਿਉਂ ਭੁੱਲ ਗਿਆ ਹੈਂ ? ਮੈ ਕਿਉਂ ਵੈਰੀ ਦੇ ਅਨੇ਼ਰ ਦੇ ਮਾਰੇ ਵਿਰਲਾਪ ਕਰਦਾ ਫਿਰਦਾ ਹਾਂ ?

Zechariah 9:5 which reads:

\v 5 ਅਸ਼ਕਲੋਨ ਵੇਖੇਗਾ ਅਤੇ ਡਰ ਜਾਵੇਗਾ , ਨਾਲੇ ਅਜਾ਼ਹ ਵੀ ਕਿਉਂ ਜੋ ਉਹ ਨੂੰ ਡਾਢੀ ਪੀੜ ਲੱਗੇਗੀ , ਨਾਲੇ ਅਕਰੋਨ ਵੀ ਕਿਉਂ ਜੋ ਉਹ ਦਾ ਭਰੋਸਾ ਸ਼ਰਮਿੰਦਾ ਹੋ ਜਾਵੇਗਾ ,ਅੱਜਾਹ ਵਿੱਚੋਂ ਰਾਜਾ ਮਿਟ ਜਾਵੇਗਾ, ਅਸ਼ਕਲੋਨ ਬੈ ਆਬਾਦ ਹੋ ਜਾਵੇਗਾ |

Both locations should be checked in the PDF file in order to review my suggested change.

DavidHaslam commented 7 years ago

To put the two exceptions in context of the other glyphs that contain a NUKTA, here is the counted data:

image

These are the only letters found to have a NUKTA in a glyph (without Unicode Normalization) image LA is shown as grey here because there were none in the Punjabi Bible text. cf. These are the six composite characters that Normalization converts to a letter and a separate NUKTA. image One might be tempted to conclude the the Gurmukhi block is short of 4 composite characters. On the other hand, the very low counts in the corresponding cells might suggest that each of these instances should also be reviewed.

DavidHaslam commented 7 years ago

cf. Some other Unicode scripts do have a canonical order for the diacritics.

e.g. Biblical Hebrew, but even so, there is a known issue with Normalization that is described in page 9 of the SBL Hebrew Font Manual.

DavidHaslam commented 7 years ago

See also issue #44

DavidHaslam commented 7 years ago

See also issue #109

DavidHaslam commented 7 years ago

The two suspect glyphs are confirmed as being invalid in that the SWORD filter algorithmic transliteration by ICU also barfed at these locations.

Psalms 42:9: paramēśura nū jō mērī caṭāna hai mai ākhāṅgā , tū mainū ki'uṁ bhula gi'ā haiṁ ? mai ki'uṁ vairī dē anēra dē mārē viralāpa karadā phiradā hāṁ ?

Zechariah 9:5: aśakalōna vēkhēgā atē ḍara jāvēgā , nālē ajāha vī ki'uṁ jō uha nū ḍāḍhī pīṛa lagēgī , nālē akarōna vī ki'uṁ jō uha dā bharōsā śaramidā hō jāvēgā , ajāha vicōṁ rājā miṭa jāvēgā, aśakalōna bai ābāda hō jāvēgā

.