wikimedia / jquery.ime

jQuery based input methods library
Other
173 stars 164 forks source link

Discussions about IMEs' porting for Asian Languages #60

Closed springrollconan closed 11 years ago

springrollconan commented 11 years ago

Currently, Different IMEs have been ported as parts of jquery.ime. These included IMEs of Indic languages, Russian and Mongolian of Outer Mongolia (which use Cyrillic Script as their writing system), as well as some Scandinavian languages. As a contributor, I am quite pleased to see the development of jquery.ime.

My country - China is among the BRICS countries, as well as India. In China, the main language - Chinese language employs the Chinese script as its writing system (though different types of romanisation scheme have been introduced to help foreigners to pronounce Chinese words, such as Pinyin which is introduced in 1958). The Chinese script is the most complicated writing system that I have ever seen since my birth. It has about 5,000 - 10,000 characters. Currently, the IMEs which are used to type Chinese are not perfect - Lots of Chinese characters shared the same pronunciation (for example, 中,which means centre, China, etc; and 終, which means final, are pronounced as zhong in Mandarin Chinese), when some of them have similar outlooks (for example, 日, which means sun, day, etc; and 曰, an archaic expression of the verb "say"), such that when you use the shape-based Chinese IMEs, their codings are the same. Most of Chinese IMEs have this function: if you type the pronunciation or the code for the Chinese character, a box will be shown and the candidate characters (or even phrases) are shown in the box. However, as a Wikimedian who is not experted in computing, I have no idea for how to turn this concept into reality for the Chinese IME of jquery.ime. Thus, I am not going to develop the Chinese IME for jquery.ime, until someone do what I want to do.

Some Asian languages employs Latin script as their writing system (e.g. Malay, Indonesian, Tagalog, as well as Zhuang language, which is mainly used by a ethnic group in Guangxi, China). I think it will be easy to develop IMEs for these languages.

Vietnamese's script is derived from Latin script (namely Quốc ngữ). Different diacritics and additional letters are used to represent the tones and specific vowels or consonants in the Vietnamese language. Different IMEs are used in Vietnam to type Vietnamese script (it is said that the most popular one is called Telex). This month, I will work for a Vietnamese Telex IME for jquery.ime.

Besides that, I will also work for a Thai IME (based on Kedmanee keyboard keyout) and a Lao IME for jquery.ime.

What's your opinion about that? You are welcomed to comment that and I am pleased to see it.

santhoshtr commented 11 years ago

Hi, Thanks for sharing your comments. Thanks for your effort in Vietnamese input methods.

I am aware of the complexities of Chinese input methods. Candidate list base input method is also present in some Indian languages too. The input method definition of jquery.ime can be complex and with custom logic functions instead of regex replacements. For providing candidate list(lookup tables), we need to think about the best solution without compromising the speed. It is not planned now, but definitely want to support it in future

springrollconan commented 11 years ago

I see. I have recently read the Vietnamese article which introduced the Telex IME. I found it may be complicated: take Đảng (which means political party) as an example. If you type "Ddarng", "DDarng", "Dadrng", "Dardng", "Darndg" "Darngd", "Ddanrg", "DDanrg", "Dadnrg", "Dandrg", "Danrdg"......by using the Telex IME, the outcome is exactly the same - Đảng. Currently, I am try to work for a very simple one.

Or, should we transplant the script used by the Vietnamese Wikipedia (which is related to its built-in Vietnamese IME)(1)?

How about the IMEs of Malay, Indonesian, Tagalog and Zhuang?

Footnote:

  1. When you visit the site, on the page's left-hand side, there is a column named "Gõ tiếng Việt (?)". Open the column and there is an option namely "Telex(?)". Select the option and you will be able to use the Telex IME to type Vietnamese in the Wikipedia.
amire80 commented 11 years ago

Malay and Indonesian don't need special input methods. They use the basic Latin alphabet.

Tagalog only uses one special character - ñ. It's also used in Spanish and it's fairly common, so I don't think that it needs our support. It will be very easy to create such a layout if anybody who speaks that language really thinks that it's needed.

It may be useful to create a layout for Zhuang, which is written in Latin, but with many special characters. However, we need a Zhuang speaker to define how this keyboard will be built. It will probably be hard to find one - the Zhuang Wikipedia is completely inactive, and there are no users in https://en.wikipedia.org/wiki/Category:User_za . If you find anybody, we'll be happy to discuss it.

It's possible to create a layout for Vietnamese, but we'll need help from a Vietnamese speaker, too.

springrollconan commented 11 years ago

Thanks for your comment. It is possible to type Malay and Indonesian under the English mode. But, as time flies, Malay and Indonesian Wikipedians may found that it is so strange that their languages' IMEs are not included in jquery.ime. I think Malay and Indonesian IMEs should be introduced sooner or later. However, I have a question. Is it possible to have the Malay and Indonesian modes without loading any specific IMEs?

I have to point out that in 1982, the government of PRC introduced a script for Zhuang language based on basic Latin alphabet. After that, it's okay to use basic Latin alphabet to write Zhuang. In the back of banknotes of Renminbi Yuan (I mean the 2005 version), the Zhuang script "Cunghgoz Yinzminz Yinzhangz xxx maenz" (which means xxx dollars, People's Bank of China) is shown. So, currently Zhuang script doesn't include any special characters. The special characters were used until 1980s, when the new script was introduced.

Besides that, I also need to point out that, I am not the native speaker of Mongolian. I only know how to say "Mongolian state" in Mongolian, as well as the first sentence of the National anthem of Mongolia. But, anyway, I can still create the Mongolian Cyrillic IME for jquery.ime.

siebrand commented 11 years ago

We think the discussion can be closed.