himselfv / wakan

Japanese and Chinese learning tool with dictionary
36 stars 7 forks source link

極り shows matches for り #278

Closed himselfv closed 8 years ago

himselfv commented 9 years ago

Original report by Anonymous.

Originally reported on Google Code with ID 278

I came accross this when copying 取り決め into clipboard and searching by it.

example: When searching A+: only on "Any matches" does it show: 取り決め. When selecting
Japanese -> English it ignores the readings of the Kanji and only takes りめ so, when
searching A you get nothing and when searching A+ you get everything with りめ as the
beginning, for ex. リメーク

Reported by supermarkus420 on 2015-03-09 10:55:07

himselfv commented 9 years ago

Original comment by Anonymous.

Forgot to mention the reason for High Priority. If you search A+ even with "Any Matches"
it ignores the Kanji. So it shows you 取り決め and its base verb 取り決める, but that's also
on A. It doesn't show you any other word that happens to have 取り決め as beginning (or
end), instead it shows you words that begin with りめ as stated above.

Reported by supermarkus420 on 2015-03-09 11:07:54

himselfv commented 9 years ago
#1:
Confirmed. This was kinda expected, though I agree it is counter-intuitive and should
be fixed.
The reason is that originally Wakan only supported searching by kana and that was called
"Japanese -> English". When "Any matches" was added, it also started searching by full
expression but that didn't made it into "Japanese -> English" mode.

#2:
> It doesn't show you any other word that happens to have 取り決め as beginning
It should. 取り決め doesn't seem to have such words though. Try 極り - there's a bunch of
expessions in EDICT which show up, like 極り文句.
One thing to note is that if there's too many results, only first 50 or so will be
displayed automatically. You need to press Enter or the arrow button which appears
to the right of the search field to show the rest. Maybe that's the reason?

Reported by himselfv on 2015-03-11 09:28:04

himselfv commented 9 years ago

Original comment by Anonymous.

Ah, I see. On Any Matches it indeed shows me what you predicted. So it is not broken
and is the best search method yet.

However I cannot shake the feeling that searching by Kanji worked before (in older
versions) even with Japanese -> English. I just do not remember it any other way, which
is why I got stumped here. The J -> E now ignores Kanji completely no matter what you
search.

This also works the other way. When searching 極り with Any Matches it also shows me
everything with only り in it, which leads to cluttering with (potentially) unwanted
results (maybe have this as an option, though?)

Reported by supermarkus420 on 2015-03-11 12:20:38

himselfv commented 9 years ago
It translated even kanji when Clipboard button was on. Perhaps you're thinking of that?

> When searching 極り with Any Matches it also shows me everything with only り in it,

Yeah, this has to be fixed.

Reported by himselfv on 2015-03-11 12:36:28

himselfv commented 9 years ago
Maybe fixed in revision #d9c40dcd6006.

1. 極り shows matches for り:

The reason for the behavior was that Wakan tried to deflex 極り both as kanji/kana input
(successfully: 極る 極りる) and as romaji (stRomaji).
It did RomajiToKana(極り) which returned "極り" which was then deflexed to "極る 極りる" again.
But this time since the search mode is stRomaji, search routine assumed there should
be only kana in deflex guesses and converted back and forth between that and romaji
(for instance to produce kunrei signature to match the same expression if it's written
in katakana instead of hiragana in the dictionary).
As a result, 極る 極りる got converted to "?ru ?riru" and then back to "る りる", which matched...
a lot of things.

The fix was to make RomajiToKana optionally return question marks where it couldn't
match romaji. RomajiToKana(極り) now gives "??", so that Wakan knows the conversion was
flawed.
In this case it is also empty after ?'s are deleted, so no deflexion takes place, but
if it contained valid romaji (e.g. "。。。goto") then "???ごと" would be the result and
Wakan would deflex and look up for ごと.

2. "Japanese -> English" mode fails to match exact kanji
Probably fixed in the same commit, see Issue 280.

Reported by himselfv on 2015-04-06 16:36:02

himselfv commented 9 years ago

Original comment by Anonymous.

Confirmed, this one is fixed and fully so.

Reported by supermarkus420 on 2015-08-09 09:33:08