mvysny / aedict

Original Aedict 2 source codes
http://www.aedict.eu
GNU General Public License v3.0
40 stars 7 forks source link

Incorrect pitch accent #819

Open torazem opened 6 years ago

torazem commented 6 years ago

The accent for 辺 (へん) should be heiban.

img_20180117_224613

Sources:

--- Want to back this issue? **[Post a bounty on it!](https://www.bountysource.com/issues/53905593-incorrect-pitch-accent?utm_campaign=plugin&utm_content=tracker%2F13546219&utm_medium=issues&utm_source=github)** We accept bounties via [Bountysource](https://www.bountysource.com/?utm_campaign=plugin&utm_content=tracker%2F13546219&utm_medium=issues&utm_source=github).
mvysny commented 6 years ago

Thanks! The data is taken from https://github.com/javdejong/nhk-pronunciation ; according to the ACCDB_unicode.csv file:

84846,68444,J68444.wav,1,5405150030,ヘン,辺,辺,辺(数),2,,,ヘンオ,0,K68444.wav,ヘン,1,0,20

The pitch is as shown in the Aedict (it's the trailing 20 that's important). However, please feel free to submit patches and corrections to the abovementioned project; Aedict will then pick up the changes automatically on the next dictionary index round.

torazem commented 6 years ago

~Thanks! I'll submit a patch this weekend.~

Edit It looks like ACCDB_unicode.csv has a heiban entry for :

84846,68444,J68444.wav,1,5405150030,ヘン,辺,辺,辺(数),2,,,ヘンオ,0,K68444.wav,ヘン,1,0,20
84847,68445,J68445.wav,1,5405160010,ヘン,辺,辺,辺,2,,,ヘンオ,0,K68445.wav,ヘン,1,0,1

The first entry looks like it is intended as a counter whereas the second entry is in the desired heiban form, so it looks like this file is correct.

I'm not yet familiar with Aedict's codebase; is it possible for Aedict to pick up both entries, or match based on other criteria?

k3zi commented 6 years ago

'counter' isn't the right word. 数 means math. That へん literally means the side/edge of like a shape. The other へん is for general area. That file does come from NHK's 1998 Accent Dictionary. Unless you recognize the 数 and compare it to maybe JMDicts info field to see if it has a 'math related term' entry then listing both is a good idea. Unfortunately that project isn't outputting the NHKexpr field which basically determines when an accent applies in cases where a words accent may change with meaning and or placement in a sentence.

torazem commented 6 years ago

Ah, thanks @k3zi, that makes more sense! In that case, I like the idea of listing both, even if it's impossible to tell which accent belongs to which context.