ymcui / MacBERT

Revisiting Pre-trained Models for Chinese Natural Language Processing (MacBERT)
https://www.aclweb.org/anthology/2020.findings-emnlp.58/
Apache License 2.0
645 stars 59 forks source link

中文单个字符如何找到同义词的 #2

Closed AlexYoung757 closed 3 years ago

AlexYoung757 commented 3 years ago

论文中提到摸型对单字符的概率是40%,对于英文来说,比较容易找到单个字符的同义词,但是单个字符对于中文来说,大概率是无法找到同义词的。难道要把这么多找不到同义词的使用随机替换?请问是如何处理的? 代码到时候会开源吗

ymcui commented 3 years ago

不是单字符是40%,是unigram,以词为粒度。

AlexYoung757 commented 3 years ago

不是单字符是40%,是unigram,以词为粒度。 懂了。具体训练细节会公布出来吗

stale[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] commented 3 years ago

Closing the issue, since no updates observed. Feel free to re-open if you need any further assistance.