Duplicate/incorrect Mandarin readings for characters

himselfv commented 11 years ago

Originally reported on Google Code with ID 213

1. If you hold your mouse over a Hanzi sign (了) it says (le, liao) but in the information
the Mandarin reading says (liao, liao)

The problem accured with many other charakters

It is very disturbing, since I have to check almo every Hanzi for the correct reading.

Thanks!

Reported by Anonymous.postum on 2013-11-08 20:29:07

himselfv commented 11 years ago

Hmm, what dictionary and what character DB are you using? Have you imported latest CCEDICT
and/or latest Unihan? Or are you just using the ones which you downloaded with Wakan?

Reported by himselfv on 2013-11-13 08:46:42

himselfv commented 11 years ago

Original comment by Anonymous.

I'm using the latest CCEDICT. I did not update Unihan since  I did not know that it
was being possible - I thought there are still accuring problems with Unihan and wakan.chr
no?

So, If I just fly over a Hanzi at "Characters" I see the tranlations of my CCEDICT
- which are correct, the CCEDICT tells me how to spell it right and so on.

But the normal information with the hanzi picture and cantonese reading is showing
me wrong mandarin reading.

I added a picture. CCDICT Tells me "le, liao1" Normal Info tells me "liao3, liao3"
(By the way, on the photo CCDICT says "liao1"...but if i go to the actual dictionary
it says "liao3"... there must somekind of a problem too)

This hanzi is "le,liao3"... but it doesn't show

Reported by Anonymous.postum on 2013-11-13 09:20:52

Attachment: 2013-11-13_101251.jpg

himselfv commented 11 years ago

Partially confirmed. What CCEDICT tells you is "liao3", not "liao1", it's just hard
to see. So this part is normal. But the info panel really says "liao3, liao3", and
that is the way it has been even in Wakan 1.67. That's a bug with Wakan.chr, there's
no way to fix it without re-importing Unihan.

With new Kanjidic/Unihan it says "liao3, le5, le" which is better, but still there's
le-duplication. I'll look into it.

Reported by himselfv on 2013-11-15 10:58:34

Status changed: Accepted

himselfv commented 11 years ago

Notes to self:
1. The text is taken from kMandarin property type which directly reflects kMandarin
Unihan field. There are other fields in Unihan which list additional properties, but
they are not added here, and that's how it should be (different data, different properties
- users can always list all properties they want).

2. Some Unihan fields are not parsed. This is tolerable, but maybe some day I'd need
to understand what those mean, and maybe parse those. There are additional readings
in those.

3. Duplicate reading arises from that kMandarin additionally has mandarin readings
from Kanjidic, and those sometimes overlap (which is expected). I need to detect and
ignore duplicate readings.

4. When reading kMandarin type fields from Unihan, tone-5 syllables such as "pe" are
written directly as "pe", not "pe5". It should be pe5 because from other cases it's
clear it's pinyin in those fields, not raw text, so "pe" means pe with tone-5. This
is required for duplication detection to work, or I'd have pe5 from kanjidic and pe
from unihan.

Reported by himselfv on 2013-11-15 11:49:17

himselfv commented 10 years ago

*renaming

Reported by himselfv on 2013-12-23 08:08:46

himselfv / wakan

Duplicate/incorrect Mandarin readings for characters #213