Open asmusf opened 6 years ago
IIUC, a normalization table can be used, as show in Table 12-37. Atomic Encoding of Malayalam Chillus https://www.unicode.org/versions/Unicode10.0.0/ch12.pdf#page=65.
I think the file should be replaced with one that uses the atomic encoding. Another issue is the use of U+0D4C which I understand is considered outdated. (Other corpora I've encountered recently do not have the latter issue).
If you don’t mind, could you send a pull request to fix the problem?
https://github.com/unicode-org/unilex/blob/master/data/frequency/ml.txt
This file is encoded with the Unicode 5.0 and earlier encoding for Chillu characters. (See Chapter 12 of Unicode 10.0).