wikimedia / jquery.ime

jQuery based input methods library
Other
171 stars 162 forks source link

Gurmukhi transliteration: addressed overapplication of virama, normalized nukta characters #698

Open bgo-eiu opened 1 year ago

bgo-eiu commented 1 year ago

Addresses https://phabricator.wikimedia.org/T91159

  1. Halant/virama no longer applies by default. A tilde ~ may simply be typed instead. The reasons to use conjunct characters in Gurmukhi are too rare to justify placing this everywhere by default.
  2. While it is true that there are only three common conjuncts as pointed out in the issue, the tilde would allow input of the 3 common ones and the uncommon ones alike.
  3. Numerals left alone per request from other users.
  4. Full stop may be added now with 'Z'; this is what Hindi transliteration input already does

Additional: normalized the nukta characters by using the standalone unicode characters for them wherever possible rather than combining characters. Made 'q' kaka pair bindi because even though this is not that common, it is still more common than udaat, which I have changed to 'Q'. Added ways to type all the common nukta/bindi combinations.

kartikm commented 1 year ago

Can you please check for failing tests and update the pull request?

bgo-eiu commented 1 year ago

Yes thank you for pointing that out. I will update the tests when I get a chance

bgo-eiu commented 1 year ago

There is a block here actually which becomes a problem - Wikipedia applies NFC normalization to Gurmukhi which changes the nukta characters to their legacy decomposed forms. This breaks a number of URLs to Punjabi external links which have characters like ਫ਼ in them. It also forces users to press backspace more than once to delete single letters, and can result in some typographic inconsistency. Wikimedia needs to support theses characters: ਫ਼ ਲ਼ ਸ਼ ‍ਗ਼ ਖ਼ ਗ਼ without decomposing them