the project can not beat pypinyin!

seanie12 commented 3 years ago

please describe your experimental setup.

In our experimental setup, our model performs better than the other baselines.

JohnHerry commented 3 years ago

The pretrained g2pM model is even worse then pypinyin. The count of poly char is too much while the training corpus is too small. But even we had extend the corpus, the result is not so good.

dyustc commented 1 year ago

`

p1 = lazy_pinyin(sentence, style=Style.TONE3, neutral_tone_with_five=True)
print('pypinyin lazy')
print(p1)

model = G2pM()
p2 = model(sentence, tone=True, char_split=False)
print('g2m')
print(p2)`

Here is what I found where it may perform worth than pypinyin... ` 然而，他红了20年以后，他在长沙长大，也在长沙退休。

pypinyin lazy

['ran2', 'er2', '，', 'ta1', 'hong2', 'le5', '20', 'nian2', 'yi3', 'hou4', '，', 'ta1', 'zai4', 'chang2', 'sha1', 'zhang3', 'da4', '，', 'ye3', 'zai4', 'chang2', 'sha1', 'tui4', 'xiu1', '。']

g2m

['ran2', 'er2', '，', 'ta1', 'hong2', 'le5', '20', 'nian2', 'yi3', 'hou4', '，', 'ta1', 'zai4', 'chang2', 'sha1', 'chang2', 'da4', '，', 'ye3', 'zai4', 'chang2', 'sha1', 'tui4', 'xiu1', '。']`

kakaobrain / g2pm

the project can not beat pypinyin! #9