Punjabi Gurmukhi script page notes/feedback

bgo-eiu commented 2 years ago

Hello, I had a look at the Gurmukhi page and it covers a lot but there are various details I would like to make comments on:

In usage & history: I would not tie these scripts for Punjabi so closely to religion. While it is true that Gurmukhi was refined for Sikh literature and is an important aspect of these works, a very large minority of Punjabis are Hindu and use Gurmukhi as well, and in predominantly Muslim parts of Indian Punjab like Malerkotla, Gurmukhi is still the primarily used script. There are also a number of Christians, Jains, and probably other religions so it gets used by everybody. The other nitpick I would say is that the majority of Punjabi speakers do not write Punjabi at all, it is a very colloquial language and most native speakers are also bilingual with other language(s). There is a lot to be said about Shahmukhi that would be good for a full page on the topic, or comparing the two orthographies.
In basic features, and later in a separate word spacing section: Spaces are mostly separated by words, but some single words are typically spelled with spaces in the middle of them in Gurmukhi. The words ਕਰ ਕੇ and ਕਰਕੇ are always pronounced exactly the same way but ਕਰ ਕੇ is a verb form and ਕਰਕੇ is a postposition. ਕੇ is an inseparable affix that may have been a separate word some centuries ago, but that you cannot insert anything between it and that it is invariant/uninflecting is a sure sign that it is not a separate word where it occurs in contemporary writing. Shahmukhi writers represent both as کرکے.
"Gurmukhi has its own set of native digits, however modern text tends to use European digits." I take issue with this because the "European digits" have been used in Punjabi longer than they have been used in Europe. The 123..., ۱۲۳..., and Gurmukhi numeric digits all originate in the Indian subcontinent and all of these forms are based on the spelling of the Punjabi words for these numbers in Gurmukhi. Arabic speakers call the digits they use in written Arabic "Hindu numerals" for this reason, where they stand out more for breaking the text direction to comply with the Indic system. The idea that there are any digits that come from Europe is sort of a European invention. For example, ਦੋ "do" is two, and if we turn this upside down we have 2 or ੨. That ۲ is also ਦੋ is more clear in some fonts/writing styles than others. Three is ਤਿੰਨ "tinn" and subsequently 3 and ੩ were derived by taking the first letter of this word, and ۳ is a representation of it rotated and placed on a staff. It is no coincidence that Gurmukhi 5 is ੫, nearly identical to the first letter of ਪੰਜ "panj" which is the word for 5 and also the word Punjab/Punjabi is derived from. I have no idea how 5 got derived from ਪ (to me it looks like 4), but the general pattern is apparent. There is a book by G. B. Singh (1950) that is cited in more recent discussions of this as being important for explaining the history in detail that I have not tracked down yet.
Regarding tone, I think the idea that tone in Punjabi is unique or unusual is misleading despite still being a common sentiment expressed in various sources. There are several tonal languages spoken in the immediate area surrounding Punjab which are each distinct. Some of these are close relatives of Punjabi which form a shared dialect continuum with it, while some of them are not particularly close relatives but seem to have this feature incidentally. These would be Dogri, Gujari, Bagri, Hindko, Shina, and Kalkoti at least. Dogri actually has orthographic conventions for writing tone. It seems like it would be more accurate to simply say that of the many South Asian languages tone has been observed in, Punjabi is the most studied.
It would be better to simply call the tones high tone, low tone, and level tone. This is how they are described in most modern Punjabi grammars in both Punjabi and English. (ਉੱਚੀ, ਮੱਧ ਅਤੇ ਨੀਵੀਂ ਸੁਰ ਵੀ ਕਹਿੰਦੇ ਹਨ though in Punjabi, the English names are just translations of the meanings.) Historically, Punjabi writers on the subject such as Duni Chandra called the high tone ਚੜ੍ਹਦੀ ਸੁਰ, translatable to "rising tone" unqualified, making it confusing to refer to use the term "rising" to describe both high and low tone. The sequential names high rising falling and low rising also suggest something about pitch contour when compared to the terminology for other languages, but pitch contour is more of a feature of sentence intonation in Punjabi. Pitch/tone as it occurs in words considered independently is more commonly uncontoured.
Skipping down further to the "tone-related consonants" section which elaborates on this subject, "In addition, the consonant ਹ [U+0A39 GURMUKHI LETTER HA] is only pronounced h when it occurs word." This is definitely not true and should be removed. Tone is unwritten explicitly in Gurmukhi, and there are places where tone is indicated implicitly, but with ਹ you just have to know the word. The classic Punjabi interjection ਆਹੋ "aho" is pronounced with a hard "h" sound towards the end and is not tonal. (If you have a phone conversation in Punjabi, half of the words you will hear will be ਆਹੋ, this is one of the most used words in general.) ਹ can be pronounced as a "hard" consonant in the middle of some words even where tone exists around it, it disappears in others (including some which are not tonal), and it often occurs simply as a consonant with an inherent vowel or attached to a level vowel like any other. Where it occurs as a conjunct/subscript, like ਨ੍ਹ, then it is not pronounced and gives way to tone. This used to represent a real non-tonal sound that has mostly disappeared from "standard" Punjabi, but this sound also probably still exists among some Punjabi speakers who likely never write in Gurmukhi. These sounds are preserved in the neighboring language Saraiki, which is kind of like the Portuguese to Punjabi's Spanish in that it shares a majority of vocabulary but has a different phonetic inventory.
In the "Repertoire extension" it says that the nukta characters are used to represent foreign sounds like those of "Urdu and Persian." This should just say Persian specifically, there are no sounds in Urdu which are foreign to Punjabi speakers. The majority of Urdu speakers in Pakistan are native Punjabi speakers anyway; arguably Urdu's phonetic inventory comes from Punjabi more than it does anything else. ਖ਼ specifically represents the combination of characters جِو in Persian for example. Since Punjab used to be under Persian rule, and Perso-Arabic orthography was the predominant writing system across the whole region until the 19th century, a number of Gurmukhi spellings take after Arabic script spellings. This is how you end up with ਫ਼ਿਅਲ / فِعل as Punjabi spellings for the word that is फेल in Hindi (still فِعل in Urdu) for example, with ਅ being used as "ain" despite this sound not existing in Punjabi. You can see patterns like this attested in the writing and subsequent republication of some older poets like Waris Shah. It also seems important to note that despite representing characters from Persian, they do not really mean much for pronunciation. If you pronounced these words in a non-Punjabi way you might be corrected on your pronunciation.
ਲ਼ is different from the other nukta letters in that it represents a native Punjabi sound which does not exist in any of the languages it has loaned heavily from. It exists in common words like ਚੌਲ਼ "chawal" (rice) but is typically unwritten. Dictionaries have taken to using it for pronunciation clarification however. You can sometimes detect where this sound exists unwritten based on the following consonant. Since this is a retroflex sound, it cannot be pronounced easily before a letter like ਣ. So ਲਨ where you would expect to see ਲਣ may be a sign of a hidden ਲ਼ .
I was not aware that there was a Unicode recommendation not to use the independent letter forms, but this seems silly. The nukta is part of these letters and typing them separately can result in more inconsistent rendering, with problems like the nukta appearing on the wrong side of letters. The precomposed code points are generally preferable.
It might be interesting to note in the bindi/tippi section that there are also some unmarked homorganic nasals in Punjabi.
In "Abbreviation, ellipsis & repetition" and probably "punctuation" too it could be noted that contractions are very common in Punjabi and these are formed with an apostrophe ' similarly to English (both in Gurmukhi and Shahmukhi). One of the most common contractions is where ਵਿੱਚ ( وِچّ ) is written and pronounced as 'ਚ ( چ' ). (The writing samples here https://www.punjabi-kavita.com/PunjabiPoetrySurjitPatar.php#Patar1 and here https://www.punjabi-kavita.com/PunjabiPoetrySurjitPatarShahmukhi.php#Patar1 are good.) You can expect to see or hear this a lot in casual/colloquial contexts, poetry, and songs where a contraction may fit better within the rhythm of a sentence.

bgo-eiu commented 2 years ago

That change in text direction on a bullet point was unexpected, I can't explain that

bgo-eiu commented 2 years ago

Something I forgot to mention is that under British rule they renamed a very large swath of villages to be a series of numbers and an abbreviation for their water source, and many of these places are still called this. This is part of why separate forms of the same number system persist, as the forms are associated with the context of what they represent. It is possible to write a village name like Chak 241 GB Garha with alternate numeric digits but they commonly will just use 241 regardless of the language of the rest of the writing. This occurs for highway route numbers and postcodes and so on too. Ordinal calendar dates are also typically represented like 24ਵੀ (24th). However, it is quite common to see in books numbered lists that are like ੧. ੨. ੩. So while these can be understood as forms of the same digits, the form used has to do with the context/purpose of the writing

r12a commented 1 year ago

@bgo-eiu many thanks for these clarifications and for taking the time to make your points clear. I finally got around to working on the Punjabi page again, and largely rewrote it. Here are some responses to your points above.

In usage & history: I would not tie these scripts for Punjabi so closely to religion...

I rewrote this.

In basic features, and later in a separate word spacing section: Spaces are mostly separated by words, but some single words are typically spelled with spaces in the middle of them in Gurmukhi. The words ਕਰ ਕੇ and ਕਰਕੇ are always pronounced exactly the same way but ਕਰ ਕੇ is a verb form and ਕਰਕੇ is a postposition. ਕੇ is an inseparable affix that may have been a separate word some centuries ago, but that you cannot insert anything between it and that it is invariant/uninflecting is a sure sign that it is not a separate word where it occurs in contemporary writing. Shahmukhi writers represent both as کرکے.

This is particularly interesting. I will raise a separate issue to discuss this further.

"Gurmukhi has its own set of native digits, however modern text tends to use European digits." ...

Yes. I've been replacing 'European' with 'ASCII' in all pages as i revise them. It's hard to put one's finger on the right way of referring to digits, but hopefully 'ASCII' takes out the regional and historical ambiguities (and doesn't suggest that those digits were only introduced with the advent of computing).

Regarding tone, I think the idea that tone in Punjabi is unique or unusual is misleading despite still being a common sentiment expressed in various sources...

I rewrote this.

It would be better to simply call the tones high tone, low tone, and level tone. This is how they are described in most modern Punjabi grammars in both Punjabi and English...

Yep. That was changed during my revision.

Skipping down further to the "tone-related consonants" section which elaborates on this subject...

This was indeed a bit of a mess. I rewrote all the explanations of how tone works, and clarified that HA is sometimes pronounced and other times used for tone marking. See the new section at https://r12a.github.io/scripts/guru/pa.html#letter_HA. Hopefully that's a bit more accurate (and detailed).

In the "Repertoire extension" it says that the nukta characters are used to represent foreign sounds like those of "Urdu and Persian." ...

Fixed.

ਲ਼ is different from the other nukta letters in that it represents a native Punjabi sound ...

Added that clarification.

I was not aware that there was a Unicode recommendation not to use the independent letter forms, but this seems silly. The nukta is part of these letters and typing them separately can result in more inconsistent rendering, with problems like the nukta appearing on the wrong side of letters. The precomposed code points are generally preferable.

I don't disagree, but note that copy-pasting text on a Mac (annoyingly) automatically normalises text, so you'll end up with a decomposed sequence without noticing. Generally, handling of nuktas in Indic scripts in Unicode is a bit higgledy-piggledy.

In "Abbreviation, ellipsis & repetition" and probably "punctuation" too it could be noted that contractions are very common in Punjabi and these are formed with an apostrophe ' ...

I can't believe i forgot to add this, since it was one of the things that stood out for me when i first saw some Punjabi text. I added it.

I also made some adjustments to the numbers section along the lines of your comment.

I mention you in the Acknowledgements section as @bgo-eiu. Would you prefer me to use your real name?

r12a / scripts

Punjabi Gurmukhi script page notes/feedback #118