ibus / ibus

Intelligent Input Bus for Linux/Unix
https://github.com/ibus/ibus/wiki
GNU Lesser General Public License v2.1
879 stars 180 forks source link

Making IBus usable in Hong Kong #1489

Closed fujiwarat closed 9 years ago

fujiwarat commented 9 years ago
We talked with Anish during GNOME.Asia, and he asked me to open a bug report so this
issue is tracked, so here goes.

-----

Both Candgie and Quick can be used to type Simplified and Traditional Chinese, as well
as Japanese.

However, given their design, there isn't any combination of keys that would conflict
between those languages. In other words, any given combination of character can only
lead to results in one of those languages, never more than one.

Given all that, it would make sense to simply remove altogether the IBus filter for
the Candgie and Quick input methods in IBus.

Let's take an example.

In Candgie, the combination "rji" can only return results in Traditional Chinese. That
means if a user types this combination of keys, he/she is expecting results in Traditional
Chinese because that's the language he/she wants to type.

But with the current IBus filter, if the filter is set to only let Simplified Chinese
characters pass, he/she would not get any results.

In the same way, the combination "yri" can only return results in Simplified Chinese,
and the combination "fji" can only return results in Japanese.

This is by design of those two input methods: they were designed to avoid conflicts.

As such, the filter just makes no sense for Candgie and Quick, and it should be simply
removed for those two input methods.

Now, in the above I claimed that Candgie and Quick were designed to have absolutely
zero conflicts, which was a little exaggeration. :)

In reality, conflicts happen. However, Candgie and Quick were really designed with
the goal of minimizing conflicts, and they do it so well that the actual rate of conflicts
is 8.04% [1]. This is such a small number, and it happens in so rare occasions, that
it can just be ignored.

It is also important to note that if ≳90% of Hong Kong people [2] use Candgie and Quick,
many people (but much less than in HK) use them in Taiwan, and almost no one use them
in Mainland China or in Japan. (as I have been told)

Out of those three, Hong Kong and Taiwan write Traditional Chinese, Mainland Chinese
write Simplified Chinese, and Japanese obviously write Japanese. So those two input
methods really are used almost exclusively to write Traditional Chinese, which makes
the aforementioned 8.04% figure completely negligible. 

As such, it doesn't change the argument at all: the current IBus filter should be removed
for the Candgie and Quick input methods.

This is an absolute show-stopper for Hong Kong users at the moment (well, not me, I
can't write Chinese ;), and the simple act of removing this filter for those two input
methods would basically fix 90% of the problems for 90% of the Hong Kong people.

GNOME 3.6 will feature a tight integration with IBus, and as such it would be awesome
if this issue could be fixed before it is out.

Of course, I'd be happy to provide a patch if you agree on the solution and if you
can provide some guidance. ;)

[1] I could only find the published numbers on this in Chinese:
    http://zh.wikipedia.org/wiki/倉頡輸入法

[2] Not just Linux users, actual **people**, as this is how everyone learns to type
at school in Hong Kong.

Original issue reported on code.google.com by bochecha@fedoraproject.org on 2012-06-25 02:05:54

fujiwarat commented 9 years ago
See also https://bugzilla.redhat.com/show_bug.cgi?id=834971

Original issue reported on code.google.com by juhpetersen on 2012-06-25 04:16:14

fujiwarat commented 9 years ago
(No text was entered with this change)

Original issue reported on code.google.com by damage3025 on 2012-06-25 06:51:12

fujiwarat commented 9 years ago
I am a IBus user in Hong Kong. I use IBus because I can type Chinese using Cangjie.

Most of the non-typing oriented computer users in Hong Kong use Quick. Nearly every
possible key sequence will result in a candidate list with multiple pages of characters.
People "workaround" this in Windows by memorizing the position of some commonly used
characters in the candidate list. But the candidate list in IBus are very different
to that in Windows which means every users needs to learn again. SCIM did a much better
job in positioning the characters in the candidate list.

These are the actual key sequences of some commonly used characters in Quick:
HI9 -> 的
HI7 -> 我
DI[space]4 -> 機
P[space] -> 心
Q[space] -> 手
PP[space][space]2 -> 懚

[space] means next page when there are multiple candidates, otherwise it is used to
commit.

Traditional Chinese characters always precede Simplified Chinese characters because
only BIG5 characters are available in old versions of the Windows Quick/Cangjie input
methods.

Original issue reported on code.google.com by jhnpwa on 2012-06-25 07:33:50

fujiwarat commented 9 years ago
Dear HK IBus users:

If possible, can you take a closer look at ibus-table and give more specific proposal
of improvements?
For example, can you propose better table data for distribution?

I know some of them has been on the issue list for a couple of time.

Original issue reported on code.google.com by damage3025 on 2012-06-25 07:57:57

fujiwarat commented 9 years ago
> But the candidate list in IBus are very different to that in Windows which means every
users needs to learn again. SCIM did a much better job in positioning the characters
in the candidate list.

That doesn't seem to be a problem in itself. I mean, lots of things are different when
going from Windows to Linux already.

Unless the ordering of suggestions is actually better in Windows of course, but otherwise,
it seems to me that IBus should try to do **better** than others, not **identical**
for the sake of being identical.

In any case, couldn't that be simply improved by making the suggestion list "adapt"
its ordering to the words most commonly used by the user? (IIRC some input methods
in IBus can do that already)

That seems like a separate improvement though, doesn't it?

The problem of the filter (which was the purpose of this bug report) still remains.

> If possible, can you take a closer look at ibus-table and give more specific proposal
of improvements?

I realize there is one thing I omitted in the bug report: the default setting for the
filter (the language it lets pass) depends on the locale of the user session.

For example, if the session is "zh_HK", then the filter will by default only show Traditional
Chinese. But for an English locale, like en_HK which is commonly used here, the filter
will default to letting only Simplified Chinese characters pass.

This happens in /usr/share/ibus-table/engine/table.py, in the get_chinese_mode function.

And finally, this is a real problem because there isn't any easily accessible UI to
change the filter setting, or at least there won't be in GNOME >= 3.6:
https://live.gnome.org/GnomeShell/Design/Guidelines/SystemStatus/InputLanguage

Again, given that the Candgie and Quick input methods were designed to limit conflicts
between different languages (i.e not provide candidates in more than one language for
every given key combination), the filter does more harm than good.

Original issue reported on code.google.com by bochecha@fedoraproject.org on 2012-06-25 08:37:35

fujiwarat commented 9 years ago
> Unless the ordering of suggestions is actually better in Windows of course, but otherwise,
it seems to me that IBus should try to do **better** than others, not **identical**
for the sake of being identical.

The ordering of suggestions is not good since even some commonly used characters are
placed in the third or even fourth page, but people are learned to type using that
order either themselves or in school. People select a character from the candidate
list without even looking at it. Introducing some smart ordering that based on how
frequently the user type a character will slow down the user and increase the rate
of error for the general public.

In fact, the ordering of the Quick Classic table is identical to that of Windows when
only considering Traditional Chinese. Changing the line 833 in /usr/share/ibus-table/engine/table.py
(the Ubuntu version, I'm not sure how much it will differ) from "_page_size = 6" to
"_page_size = 9" will make it apparently identical to that of Windows. But I think
that the behavior of the space key will need to be changed as it is currently used
to commit a character directly but not to flip page.

Quick was not designed to limit conflicts in the same language, only Cangjie was. There
are thousands of commonly used Chinese characters, but after two key presses in Quick,
the candidate selection list popup immediately.

I do agree that the filter should be removed.

Original issue reported on code.google.com by jhnpwa on 2012-06-25 09:50:57

fujiwarat commented 9 years ago
We know the filtering problem already, any proposal other than simply removing it?

I guess ibus-table engine was used to implement phonetic based input methods like pinyin,
where filtering is necessary. So filtering is implemented.

en_HK gives same filtering as zh_HK, after a not so recent commit.

Character ordering is all about table data I guess.
Doing dynamic adjustment or not is configurable. Maybe it is not available in GUI.

Other enhancements may be a little harder under current code base.

Original issue reported on code.google.com by damage3025 on 2012-06-25 10:12:36

fujiwarat commented 9 years ago
> We know the filtering problem already, any proposal other than simply removing it?

But why keep it for input methods where it really is unnecessary, and where it causes
more harm than good?

> I guess ibus-table engine was used to implement phonetic based input methods like
pinyin, where filtering is necessary. So filtering is implemented.

FWIW, Candgie and Quick are not phonetic based, they are stroke based.

> en_HK gives same filtering as zh_HK, after a not so recent commit.

That's very good news, but I'm afraid it's not sufficient: Hong Kong being such a multi-cultural
city, I don't think filtering on the locale is a good idea at all.

I mean, there are 13000 French people living in Hong Kong at the moment. Most of them
use the fr_FR locale, and I'd bet a lot of them have been in Hong Kong long enough
to have learnt how to type Traditional Chinese. But with the current implementation,
they'd get a Simplified Chinese filter. Repeat for 18000+ Indian, etc...

(I know they don't all use Linux, but let's not limit ourselves artificially)

Original issue reported on code.google.com by bochecha@fedoraproject.org on 2012-06-25 10:21:10

fujiwarat commented 9 years ago
ibus-table is still used to implement phonetic based input methods like Cantonese. Though,
I don't know exactly which Cantonese Romanization system it is using.

Several Romanization systems for Cantonese co-exists actually. My favorite is Jyutping.

Original issue reported on code.google.com by damage3025 on 2012-06-25 10:35:25

fujiwarat commented 9 years ago
In case this wasn't clear: I never argued for the removal of the filter.

I argued in favour of removing the filter **for the Candgie and Quick input methods
only**.

Original issue reported on code.google.com by bochecha@fedoraproject.org on 2012-06-25 10:44:30

fujiwarat commented 9 years ago
Let's think out of the box.
For the function 
  def get_chinese_mode (self):
in engine/table.py,
it defines which default filter of the locale using.

But the table of Cangjie5 is designed for both traditional and simplify Chinese. 
If we cannot remove the filter. why not make the filter default return 3 (3 is Big
charset mode, but traditional Chinese first)?

For the programming issues, it is not good to hard code it in to the code, but ibus-table
is using sqlite as the DB of the tables, why it cannot put the best practice of the
filter for each input method and read that value as the default filter value?
Why the default filter value is depends on the locale rather than depends on the input
method itself? 

Original issue reported on code.google.com by wanleungwong on 2012-06-26 06:18:37

fujiwarat commented 9 years ago
Thank you very much for all efforts and information in this issue.
However, same issue is already reported in Issue 1188.
So, please move on.

Original issue reported on code.google.com by damage3025 on 2012-06-27 13:49:04