kakaobrain / g2pm

A Neural Grapheme-to-Phoneme Conversion Package for Mandarin Chinese Based on a New Open Benchmark Dataset
Apache License 2.0
336 stars 73 forks source link

the project can not beat pypinyin! #9

Open shawnthu opened 3 years ago

seanie12 commented 3 years ago

please describe your experimental setup.

In our experimental setup, our model performs better than the other baselines.

JohnHerry commented 3 years ago

The pretrained g2pM model is even worse then pypinyin. The count of poly char is too much while the training corpus is too small. But even we had extend the corpus, the result is not so good.

dyustc commented 1 year ago

`

p1 = lazy_pinyin(sentence, style=Style.TONE3, neutral_tone_with_five=True)
print('pypinyin lazy')
print(p1)

model = G2pM()
p2 = model(sentence, tone=True, char_split=False)
print('g2m')
print(p2)`

Here is what I found where it may perform worth than pypinyin... ` 然而,他红了20年以后,他在长沙长大,也在长沙退休。

pypinyin lazy

['ran2', 'er2', ',', 'ta1', 'hong2', 'le5', '20', 'nian2', 'yi3', 'hou4', ',', 'ta1', 'zai4', 'chang2', 'sha1', 'zhang3', 'da4', ',', 'ye3', 'zai4', 'chang2', 'sha1', 'tui4', 'xiu1', '。']

g2m

['ran2', 'er2', ',', 'ta1', 'hong2', 'le5', '20', 'nian2', 'yi3', 'hou4', ',', 'ta1', 'zai4', 'chang2', 'sha1', 'chang2', 'da4', ',', 'ye3', 'zai4', 'chang2', 'sha1', 'tui4', 'xiu1', '。']`