Open lianghai opened 4 years ago
Ref https://software.sil.org/downloads/r/mondulkiri/Mondulkiri-7.100-developer.zip (via Didi) Under 'documentation'
Ref: Makara's paper on 'Spoof-Vulnerable Rendering in Khmer Unicode Implementations' https://www.sil.org/system/files/reapdata/15/34/62/153462537465381054623906304930919921193/Spoof_Vulnerable_Rendering_in_Khmer_Unic.pdf
This is the full slide deck I briefly touched during the meeting, originally prepared for the Unicode Conference last month: An open knowledge base for Indic text shaping
Discrepancies in Khmer Unicode character ordering rules and a proposed solution by Makara SOK, presented at IUC44.
An example of how to graphically define the written units of Khmer, independent from how they’re encoded: https://docs.google.com/spreadsheets/d/1YS0OJfw4Fr6wVh-0oyUvZc4Pi-2Dond4pnqFHqTEEi4/edit?usp=sharing
A list of words from the Khmer Official [Chuon Nath] Dictionary with Robat (U+17CC): http://dictionary.tovnah.com/reg-search?qu=%E1%9F%8C
Review the names list and propose changes.
For better recognizing minority orthographies (eg, U+17DD KHMER SIGN ATTHACAN isn’t “obsolete” in some minority orthographies), refer to https://github.com/sillsdev/khmer-character-specification/blob/master/specification.md
Improve or remove the confusing annotations under U+17A8 KHMER INDEPENDENT VOWEL QUK: “• obsolete ligature for the sequence 17A7 1780 • use of the sequence is now preferred”
@lianghai: Improve the “Khmer: graphical analysis” spreadsheet with suggestions incorporated.
@lianghai: Share a draft of data files like the Mongolian ones: https://github.com/lianghai/mongolian/tree/utn/utn/data
Investigate collation algorithms.
Some of Problems concerning Fonts used in Writings.pdf by KEO Linet, Department of Khmerisation, Lexicography and Translation of the National Language Institute (NLI) of the Royal Academy of Cambodia (RAC).
Hi everyone,
Unfortunately I wasn't able to be eyes-open for the Nov 20 meeting (will try harder for the next one!) although I really wanted to be there.
At the "meta-question" level, would it be possible for people who are posting downloadable resources (e.g., PDF slide decks) to also mention what the licensing is on those documents? If it's possible, of course. I know it might not always be so. Or, at least, to mention something about the source, if not a legaese-formal 'license' per se....
Certainly that's not a huge impediment to viewing anything at present, but I have a bad tendency to collect such resources locally and save them for future reference, and over time it has kind of become a problem when I can't recall what the origin & circumstances of a PDF are.
Don't mean this to be a burden on anyone; perhaps just consider it a plea for future help. Folks posting their own slide decks is pretty straightforward, but I'm less clear about some of the external links and material in .zip files.
I'm less clear about some of the external links and material in .zip files
Please let us know the exact links so that we can probably add the metadata for you.
For the .zip files, see "Other Resources" at: https://software.sil.org/mondulkiri/.
Well, "Some of Problems concerning Fonts used in Writings.pptx" certainly is one. I don't see a date on it, and I can't tell what organization the author is from (searching doesn't turn anything up).
Khmer sorting order rules based on the existing method used in the Khmer-Khmer dictionary published in 1967: https://docs.google.com/document/d/1n64obcr8PyYX9Xgk371xk3i0euOTjgMDRKz2TjjOpN0/edit?usp=sharing
Additional files discussed today:
Documents to review before the next meeting (two weeks later):
Section 7, Unicode Encoding, and section 8, Text Processing (8.2 and 8.3 are about sorting and font) of Makara’s draft specification.
Makara’s revised document on “Khmer sorting order rules based on the existing method used in the Khmer-Khmer dictionary published in 1967”.
Note that using Richard’s character pickers one can easily construct an arbitrary string, without being restricted by a keyboard layout: https://r12a.github.io/pickers/khmr/ cc @MakaraSok @iwsfutcmd
This zip file contains research documents by Javier Sola of the then Open Forum of Cambodia:
Here are some more from Javier. The zip file included here is given as is. Executable files have been excluded by the owner as they have been falsely flagged as virus
.
Slide 23 of this document issued by the MoEYS explains explicitly where the Consonant Shifters (aka Register Shifters) should go when typing:
PDF version: How_to_type_Khmer_Unicode.pdf
Source: http://krou.moeys.gov.kh/kh/article/item/download/595_aef67c4f54defb5c2d63718a0e120456.html
The link below contains the translated version of the "How to type Khmer Unicode" above among other things related to Khmer Unicode. The file name is "How_to_type_Khmer_Unicode.ver1.1km.pdf".
https://www.mef.gov.kh/documents/fonts/khmer-unicode-for-mef.zip
Since this material is on the ministry website, it is "most likely" that they have been used/adopted by the ministry.
The highlight is that the Character Ordering is different from the Unicode Standard.
I can’t read Khmer, but it appears that How_to_type_Khmer_Unicode.ver1.1km.pdf differs in some ways, e.g. by adding a discussion of “Nonbreakable Space”, from the English 1.0 version: https://web.archive.org/web/20180712194920/http://khmeros.info/download/KhmerUnicodeTyping.pdf
Is there an English 1.1 version?
For our record, here is the link to the newly drafted Khmer Encoding Research: https://docs.google.com/document/d/18KlDJkea9k57zFQ52V6JFvVNOYm-y-4hJJLmqudmWrE/edit#.
The next group meeting will be discussed around this document.
Prompted by @MakaraSok’s recent talk at the Unicode Conference, a small group of people, including Makara, @NorbertLindenberg, and me, have been trying to set up some latest discussions about the Khmer script’s encoding–shaping issues. The group has met on 6 Nov, 20 Nov, 4 Dec, and 18 Dec in 2020.
The next meeting is scheduled for Friday 8 January 2021: