ogata0916 / mozc

Automatically exported from code.google.com/p/mozc
0 stars 0 forks source link

ibus-mozc may fail to extracted surrounding text correctly #226

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
Version: Mozc-1.15.1811.102 (r231)
OS: Ubuntu 12.04

What steps will reproduce the problem?
1. Launch gedit.
2. Make sure all the conversion histories are cleared and user dictionary is 
empty.
3. Make sure Mozc is turned off.
4. Copy-and-paste '1' to the gedit.
5. Hit Hankaku/Zenkaku key to turn on Mozc
6. Enter 'ひき' and hit space key to convert it.
7. Hit ESC key.
8. Replace '1' with ' 1' with copy-and-paste.
9. Enter 'ひき' and hit space key to convert it.

What is the expected output?
At step 6, you will see "匹" as the top candidate.
At step 9, you will see "匹" as the top candidate.

What do you see instead?
At step 6, you will see "匹" as the top candidate.
At step 9, you will see "引き" as the top candidate.

Here is the root cause.

https://code.google.com/p/mozc/source/browse/trunk/src/unix/ibus/mozc_engine.cc?
r=163#269
>  const uint32 selection_start = min(cursor_pos, anchor_pos);
>  const uint32 selection_length = abs(info->relative_selected_length);
>  info->preceding_text = surrounding_text.substr(0, selection_start);
>  Util::SubString(surrounding_text,
>                  selection_start,
>                  selection_length,
>                  &info->selection_text);
>  info->following_text = surrounding_text.substr(
>      selection_start + selection_length);

|cursor_pos| and |anchor_pos| are the count of Unicode characters, not the byte 
offset in UTF-8 string. However,
|info->preceding_text| and |info->following_text| are extracted as if 
|cursor_pos| and |anchor_pos| were the byte offset in UTF-8 string.  As a 
result, these strings could be initialized with invalid UTF-8 sequence.

Note that |info->selection_text| are correctly initialized with the selected 
text.

It should noted that there is another concern that we have forgotten to make 
sure if |selection_start| and |selection_start + selection_length| are within 
the range of |surrounding_text|. This might be problematic because this means 
that we are using parameters passed from external program (IBus server in this 
case) without any range check. 
Actually the crash reported in Red Hat Bug 1100974 is highly likely to be 
avoided if we had verified these parameters.
https://bugzilla.redhat.com/show_bug.cgi?id=1100974

Original issue reported on code.google.com by yukawa@google.com on 21 Jun 2014 at 1:51

GoogleCodeExporter commented 9 years ago
Should be fixed in Mozc-1.15.1813.102 (r233).

Original comment by yukawa@google.com on 21 Jun 2014 at 4:13