About HANKAKU and ZENKAKU substitution

reazon-research / ReazonSpeech

Massive open Japanese speech corpus

Apache License 2.0

239 stars 18 forks source link

Thank you very much for your great work. After reviewing the source code below, I thought there was a concise way to write it in the HANKAKU to ZENKAKU conversion section. ReazonSpeech/reazonspeech/text.py

You define it as follows

_HAN2ZEN = str.maketrans(
    "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789",
    "ａｂｃｄｅｆｇｈｉｊｋｌｍｎｏｐｑｒｓｔｕｖｗｘｙｚＡＢＣＤＥＦＧＨＩＪＫＬＭＮＯＰＱＲＳＴＵＶＷＸＹＺ０１２３４５６７８９")

...
return text.translate(_SPECIALS).translate(_HAN2ZEN)

However, since espnet is required to use this tool, the dependent library jaconv should be installed. Therefore, it is believed that this code can be realized with the following

return jaconv.h2z(text, kana=True, digit=True, ascii=True)

I hope this is helpful.

reazon-research / ReazonSpeech

About HANKAKU and ZENKAKU substitution #3