themoeway / yomitan

Japanese pop-up dictionary browser extension. Successor to Yomichan.
https://chromewebstore.google.com/detail/yomitan/likgccmbimhjbgkjambclfkhldnlhbnn
GNU General Public License v3.0
981 stars 74 forks source link

Pitch accent for compound words #78

Open themoeway-bot opened 1 year ago

themoeway-bot commented 1 year ago

archiif opened issue FooSoft/yomichan#1542 on 2021-03-18


Currently, Yomichan can't display the pitch accent for compound words correctly (or maybe the data from Kanjium is lacking?). For example with: 一子相伝 Yomichan would display this: image

But the word actually consists of two different pitch accents, atamadaka for the first part of the compound word and heiban for the second one. For reference, this is what is displayed in the NHK pitch accent dictionary: image

Maybe what's happening is that the pitch accents Yomichan is displaying above are simply the two parts of a single compound pitch, but Yomichan is incorrectly treating these two pitch accents as if they are simply two accent variants. But this is just a wild guess.

themoeway-bot commented 1 year ago

toasted-nutbread commented on 2021-03-18


The issue is that the source data represents it as a single word, and Yomichan doesn't attempt to do lookups of the individual parts of compound words, as there is not a good way to reliably do this.

The source data for the term you listed is the following:

term     reading    accents
一子相伝  いっしそうでん  1,0

And I don't believe that the multiple comma-separated values generally represent the accents of the compounds, although the format of this file isn't really documented.

themoeway-bot commented 1 year ago

archiif commented on 2021-03-22


I see, there doesn't seem to be any great solutions for automatic pitch accent generation of compound words. For now I'll just manually edit the pitch accent data for my cards.

themoeway-bot commented 1 year ago

redacted0 commented on 2022-02-17


@toasted-nutbread Yomichan doesn't seem to support this anyways though. The JSON format assumes that there can only be one pitch accent phrase in a word.

I don't think it would be effective to get Yomichan to do lookups for each part since those lookups could lead to erroneous accents.

It would be best if I could just add multiple phrases like:

[
    "一子相伝",
    "pitch",
    {
        "reading": "いっしそうでん",
        "pitches": [
            [{
                "pronunciation":  "イッシ",
                "position":1,
                "nasal":[],
                "devoice":[]
            }, {
                "pronunciation":  "ソーデン",
                "position":0,
                "nasal":[],
                "devoice":[]
            }]
        ]
    }
]

This would also have the added benefit of allowing a specific pronunciation instead of using the reading (which is currently used to correlate to other dictionary entries). I.e. 通う(カヨウ) vs 火曜(カヨー).

themoeway-bot commented 1 year ago

redacted0 commented on 2022-02-17


Although I do agree that this source data from Kanjium doesn't make use of having multiple phrases, I still would like to add that it would be a good idea so that we can utilise sources that do use multiple phrases