r12a / scripts

Various pages and tools for working with non-Latin scripts
http://r12a.github.io/doclist
34 stars 15 forks source link

Punjabi Gurmukhi script page notes/feedback #118

Open bgo-eiu opened 2 years ago

bgo-eiu commented 2 years ago

Hello, I had a look at the Gurmukhi page and it covers a lot but there are various details I would like to make comments on:

bgo-eiu commented 2 years ago

That change in text direction on a bullet point was unexpected, I can't explain that

bgo-eiu commented 2 years ago

Something I forgot to mention is that under British rule they renamed a very large swath of villages to be a series of numbers and an abbreviation for their water source, and many of these places are still called this. This is part of why separate forms of the same number system persist, as the forms are associated with the context of what they represent. It is possible to write a village name like Chak 241 GB Garha with alternate numeric digits but they commonly will just use 241 regardless of the language of the rest of the writing. This occurs for highway route numbers and postcodes and so on too. Ordinal calendar dates are also typically represented like 24ਵੀ (24th). However, it is quite common to see in books numbered lists that are like ੧. ੨. ੩. So while these can be understood as forms of the same digits, the form used has to do with the context/purpose of the writing

r12a commented 1 year ago

@bgo-eiu many thanks for these clarifications and for taking the time to make your points clear. I finally got around to working on the Punjabi page again, and largely rewrote it. Here are some responses to your points above.

In usage & history: I would not tie these scripts for Punjabi so closely to religion...

I rewrote this.

In basic features, and later in a separate word spacing section: Spaces are mostly separated by words, but some single words are typically spelled with spaces in the middle of them in Gurmukhi. The words ਕਰ ਕੇ and ਕਰਕੇ are always pronounced exactly the same way but ਕਰ ਕੇ is a verb form and ਕਰਕੇ is a postposition. ਕੇ is an inseparable affix that may have been a separate word some centuries ago, but that you cannot insert anything between it and that it is invariant/uninflecting is a sure sign that it is not a separate word where it occurs in contemporary writing. Shahmukhi writers represent both as کرکے.

This is particularly interesting. I will raise a separate issue to discuss this further.

"Gurmukhi has its own set of native digits, however modern text tends to use European digits." ...

Yes. I've been replacing 'European' with 'ASCII' in all pages as i revise them. It's hard to put one's finger on the right way of referring to digits, but hopefully 'ASCII' takes out the regional and historical ambiguities (and doesn't suggest that those digits were only introduced with the advent of computing).

Regarding tone, I think the idea that tone in Punjabi is unique or unusual is misleading despite still being a common sentiment expressed in various sources...

I rewrote this.

It would be better to simply call the tones high tone, low tone, and level tone. This is how they are described in most modern Punjabi grammars in both Punjabi and English...

Yep. That was changed during my revision.

Skipping down further to the "tone-related consonants" section which elaborates on this subject...

This was indeed a bit of a mess. I rewrote all the explanations of how tone works, and clarified that HA is sometimes pronounced and other times used for tone marking. See the new section at https://r12a.github.io/scripts/guru/pa.html#letter_HA. Hopefully that's a bit more accurate (and detailed).

In the "Repertoire extension" it says that the nukta characters are used to represent foreign sounds like those of "Urdu and Persian." ...

Fixed.

ਲ਼ is different from the other nukta letters in that it represents a native Punjabi sound ...

Added that clarification.

I was not aware that there was a Unicode recommendation not to use the independent letter forms, but this seems silly. The nukta is part of these letters and typing them separately can result in more inconsistent rendering, with problems like the nukta appearing on the wrong side of letters. The precomposed code points are generally preferable.

I don't disagree, but note that copy-pasting text on a Mac (annoyingly) automatically normalises text, so you'll end up with a decomposed sequence without noticing. Generally, handling of nuktas in Indic scripts in Unicode is a bit higgledy-piggledy.

In "Abbreviation, ellipsis & repetition" and probably "punctuation" too it could be noted that contractions are very common in Punjabi and these are formed with an apostrophe ' ...

I can't believe i forgot to add this, since it was one of the things that stood out for me when i first saw some Punjabi text. I added it.

I also made some adjustments to the numbers section along the lines of your comment.

I mention you in the Acknowledgements section as @bgo-eiu. Would you prefer me to use your real name?