w3c / font-text-cg

GitHub Pages
https://w3c.github.io/font-text-cg/
Other
28 stars 5 forks source link

Some latest round of Khmer encoding–shaping discussions #27

Open lianghai opened 4 years ago

lianghai commented 4 years ago

Prompted by @MakaraSok’s recent talk at the Unicode Conference, a small group of people, including Makara, @NorbertLindenberg, and me, have been trying to set up some latest discussions about the Khmer script’s encoding–shaping issues. The group has met on 6 Nov, 20 Nov, 4 Dec, and 18 Dec in 2020.

The next meeting is scheduled for Friday 8 January 2021:

Liang Hai is inviting you to a scheduled Zoom meeting.

Topic: Khmer clusters Time: Fri 8 Jan 2021, 9:00 am Cambodia (UTC+7, see also time zone conversion and calendar file) Recurring: The same time, every two weeks (Fri 22 Jan, Fri 5 Feb, …)

Join Zoom Meeting https://zoom.us/j/93120577362

Meeting ID: 931 2057 7362 Find your local number for dialing in: https://zoom.us/u/acI3YWbZaU

MakaraSok commented 4 years ago

Ref: Khmer Character Specification/Usages

mcdurdin commented 4 years ago

Ref https://software.sil.org/downloads/r/mondulkiri/Mondulkiri-7.100-developer.zip (via Didi) Under 'documentation'

mcdurdin commented 4 years ago

Ref: Makara's paper on 'Spoof-Vulnerable Rendering in Khmer Unicode Implementations' https://www.sil.org/system/files/reapdata/15/34/62/153462537465381054623906304930919921193/Spoof_Vulnerable_Rendering_in_Khmer_Unic.pdf

NorbertLindenberg commented 4 years ago

Defining Khmer clusters

mhosken commented 4 years ago

https://github.com/n8willis/opentype-shaping-documents/blob/master/opentype-shaping-khmer.md

lianghai commented 4 years ago

This is the full slide deck I briefly touched during the meeting, originally prepared for the Unicode Conference last month: An open knowledge base for Indic text shaping

MakaraSok commented 4 years ago

Discrepancies in Khmer Unicode character ordering rules and a proposed solution by Makara SOK, presented at IUC44.

lianghai commented 3 years ago

An example of how to graphically define the written units of Khmer, independent from how they’re encoded: https://docs.google.com/spreadsheets/d/1YS0OJfw4Fr6wVh-0oyUvZc4Pi-2Dond4pnqFHqTEEi4/edit?usp=sharing

MakaraSok commented 3 years ago

A list of words from the Khmer Official [Chuon Nath] Dictionary with Robat (U+17CC): http://dictionary.tovnah.com/reg-search?qu=%E1%9F%8C

  1. កក៌ដ 2. កប៌ូរ 3. កាណ៌ 4. កាប៌ាស
  2. ឋានសួគ៌ 6. តប​ធម៌ 7. តូយ៌តន្ត្រី 8. ទក្ខិណាព័ត៌
  3. ទិដ្ឋធម៌ 10. ទុគ៌ត 11. ទុគ៌ម 12. ទុជ៌ន
  4. ទុព៌ល 14. ទុយ៌ស 15. ទេយ្យធម៌ 16. ទេសធម៌
  5. ធម៌ 18. នាយក​ធម៌ 19. និយ្យានិក​ធម៌ 20. នីវណរ​ធម៌
  6. គភ៌ 22. បញ្ច​ពណ៌ 23. បរិបូណ៌ 24. បរិបូណ៌
  7. បាបធម៌ 26. បូណ៌មី 27. បូព៌ 28. បូព៌​ទិស
  8. បូព៌​និមិត្ត 30. បោក្ខរព័ស៌ 31. ពណ៌ 32. ពណ៌នា
  9. ពាណ៌នា 34. ពិពណ៌នា 35. ពោធិបក្ខិយធម៌ 36. ព័ត៌មាន
  10. ព័ត៌មាន​កាល 38. ព្យាធិ​ធម៌ 39. មទ្រីបាព៌ 40. មាគ៌
  11. មាគ៌ា 42. យុត្តិធម៌ 43. លម្អក់​ព័ណ៌ 44. លោមព័ណ៌
  12. លំអក់ព័ណ៌ 46. វណ៌ 47. វិបយ៌ាយ 48. វិបយ៌ាស
  13. វិបរិណាមធម៌ 50. វិសគ៌ៈ 51. វិសទព័ណ៌ 52. សកដមាគ៌ា
  14. សគ៌ៈ 54. សង្ខតធម៌ 55. សព៌ជ្ញ 56. សព៌ាង្គ
  15. សព៌េជ្ញ 58. សព៌េជ្ញ​សាស្ដា 59. សម្បូណ៌ 60. សម្បូណ៌
  16. សិទ្ធាថ៌ 62. សុជីវធម៌ 63. សុពណ៌ 64. សួគ៌
  17. សួគ៌ា 66. ស្លាធម៌ 67. ហៃមពណ៌ 68. អឃ៌
  18. អជ៌ុន 70. អថ៌ 71. អធម៌ 72. អនាយ៌
  19. អន្តរាយិកធម៌ 74. អយុត្តិធម៌ 75. អសង្ខតធម៌ 76. អាឃ៌
  20. អាថ៌ 78. អាថ៌កំបាំង 79. អាយ៌េន 80. ឧន្មាគ៌ា
  21. ឆកាមាវចរ​សួគ៌ 82. ជង្ឃមាគ៌ា 83. ជាតិធម៌
lianghai commented 3 years ago

Meeting on 20 Nov 2020

Action items

  1. Review the names list and propose changes.

  2. @lianghai: Improve the “Khmer: graphical analysis” spreadsheet with suggestions incorporated.

  3. @lianghai: Share a draft of data files like the Mongolian ones: https://github.com/lianghai/mongolian/tree/utn/utn/data

  4. Investigate collation algorithms.

Links and files shared in Zoom chat

MakaraSok commented 3 years ago

Some of Problems concerning Fonts used in Writings.pdf by KEO Linet, Department of Khmerisation, Lexicography and Translation of the National Language Institute (NLI) of the Royal Academy of Cambodia (RAC).

n8willis commented 3 years ago

Hi everyone,

Unfortunately I wasn't able to be eyes-open for the Nov 20 meeting (will try harder for the next one!) although I really wanted to be there.

At the "meta-question" level, would it be possible for people who are posting downloadable resources (e.g., PDF slide decks) to also mention what the licensing is on those documents? If it's possible, of course. I know it might not always be so. Or, at least, to mention something about the source, if not a legaese-formal 'license' per se....

Certainly that's not a huge impediment to viewing anything at present, but I have a bad tendency to collect such resources locally and save them for future reference, and over time it has kind of become a problem when I can't recall what the origin & circumstances of a PDF are.

Don't mean this to be a burden on anyone; perhaps just consider it a plea for future help. Folks posting their own slide decks is pretty straightforward, but I'm less clear about some of the external links and material in .zip files.

MakaraSok commented 3 years ago

I'm less clear about some of the external links and material in .zip files

Please let us know the exact links so that we can probably add the metadata for you.

For the .zip files, see "Other Resources" at: https://software.sil.org/mondulkiri/.

n8willis commented 3 years ago

Well, "Some of Problems concerning Fonts used in Writings.pptx" certainly is one. I don't see a date on it, and I can't tell what organization the author is from (searching doesn't turn anything up).

MakaraSok commented 3 years ago

Khmer sorting order rules based on the existing method used in the Khmer-Khmer dictionary published in 1967: https://docs.google.com/document/d/1n64obcr8PyYX9Xgk371xk3i0euOTjgMDRKz2TjjOpN0/edit?usp=sharing

lianghai commented 3 years ago

Additional files discussed today:

Documents to review before the next meeting (two weeks later):

  1. Section 7, Unicode Encoding, and section 8, Text Processing (8.2 and 8.3 are about sorting and font) of Makara’s draft specification.

  2. https://web.archive.org/web/20150105024205/http://www.panl10n.net/english/final%20reports/pdf%20files/Cambodia/CAM01.pdf

  3. Makara’s revised document on “Khmer sorting order rules based on the existing method used in the Khmer-Khmer dictionary published in 1967”.

lianghai commented 3 years ago

Note that using Richard’s character pickers one can easily construct an arbitrary string, without being restricted by a keyboard layout: https://r12a.github.io/pickers/khmr/ cc @MakaraSok @iwsfutcmd

MakaraSok commented 3 years ago

This zip file contains research documents by Javier Sola of the then Open Forum of Cambodia:

Javier's.zip

MakaraSok commented 3 years ago

Here are some more from Javier. The zip file included here is given as is. Executable files have been excluded by the owner as they have been falsely flagged as virus.

UNICODE-20210120T060911Z-001.zip

MakaraSok commented 3 years ago

Slide 23 of this document issued by the MoEYS explains explicitly where the Consonant Shifters (aka Register Shifters) should go when typing:

PDF version: How_to_type_Khmer_Unicode.pdf

Source: http://krou.moeys.gov.kh/kh/article/item/download/595_aef67c4f54defb5c2d63718a0e120456.html

MakaraSok commented 3 years ago

The link below contains the translated version of the "How to type Khmer Unicode" above among other things related to Khmer Unicode. The file name is "How_to_type_Khmer_Unicode.ver1.1km.pdf".

https://www.mef.gov.kh/documents/fonts/khmer-unicode-for-mef.zip

Since this material is on the ministry website, it is "most likely" that they have been used/adopted by the ministry.

The highlight is that the Character Ordering is different from the Unicode Standard.

NorbertLindenberg commented 3 years ago

I can’t read Khmer, but it appears that How_to_type_Khmer_Unicode.ver1.1km.pdf differs in some ways, e.g. by adding a discussion of “Nonbreakable Space”, from the English 1.0 version: https://web.archive.org/web/20180712194920/http://khmeros.info/download/KhmerUnicodeTyping.pdf

Is there an English 1.1 version?

MakaraSok commented 3 years ago

For our record, here is the link to the newly drafted Khmer Encoding Research: https://docs.google.com/document/d/18KlDJkea9k57zFQ52V6JFvVNOYm-y-4hJJLmqudmWrE/edit#.

The next group meeting will be discussed around this document.