Closed anbo724 closed 2 years ago
你好, 是利用SentencePiece的BPE算法做拆分粒度的自动学习,最终得到的是单字符与多字符的混合粒度
谢谢,但是可以说的具体一点吗?藏文里面的单字符是字丁吗?多字符指的是什么?藏文分词还是藏文音节? 或者是否方便发这部分的细节内容给我?45098072@qq.com,谢谢
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Closing the issue, since no updates observed. Feel free to re-open if you need any further assistance.
你好,与汉语不同,民族语言的字是由不同的部件构成的,请问对于民族语言做了哪些预处理呢? 比如藏文是按照字丁(单独编码)、音节还是其他的文本粒度为单位进行建模学习呢?