wenet-e2e / WeTextProcessing

Text Normalization & Inverse Text Normalization
Apache License 2.0
471 stars 69 forks source link

"三四万"转换正确,但是“三四十万“转化错误 #232

Closed vbiwuevbaeib closed 5 months ago

vbiwuevbaeib commented 5 months ago

”三四十万“会被理解为”三四 十万“,这种问题应该怎么修改优化呢?

invnormalizer.normalize("需要三四万")
char { value: "需" } char { value: "要" } cardinal { value: "3~4万" } '需要3~4万' invnormalizer.normalize("需要三四十万") char { value: "需" } char { value: "要" } cardinal { value: "3~4 100000" } '需要3~4 100000'

vbiwuevbaeib commented 5 months ago

还有像整数+约数范围,有时候也会出错,比如:

invnormalizer.normalize("需要一万六七")
char { value: "需" } char { value: "要" } cardinal { value: "10000 6~7" } '需要10000 6~7'

xingchensong commented 5 months ago

https://github.com/wenet-e2e/WeTextProcessing/pull/234 修好了

vbiwuevbaeib commented 5 months ago

可以了,非常感谢!