ymcui / Chinese-BERT-wwm

Pre-Training with Whole Word Masking for Chinese BERT(中文BERT-wwm系列模型)
https://ieeexplore.ieee.org/document/9599397
Apache License 2.0
9.56k stars 1.38k forks source link

关于fill-mask的一些疑问 #184

Closed yooopan closed 3 years ago

yooopan commented 3 years ago

中国[MASK]:

{'sequence': '中 国 :', 'score': 0.5457051992416382, 'token': 8038, 'token_str': ':'}
{'sequence': '中 国 :', 'score': 0.09207046031951904, 'token': 131, 'token_str': ':'}
{'sequence': '中 国 -', 'score': 0.06536566466093063, 'token': 118, 'token_str': '-'}
{'sequence': '中 国 。', 'score': 0.06007284298539162, 'token': 511, 'token_str': '。'}
{'sequence': '中 国 版', 'score': 0.03868889436125755, 'token': 4276, 'token_str': '版'}
{'sequence': '中 国 ;', 'score': 0.01822206936776638, 'token': 8039, 'token_str': ';'}
{'sequence': '中 国 的', 'score': 0.013966748490929604, 'token': 4638, 'token_str': '的'}
{'sequence': '中 国 ,', 'score': 0.007958734408020973, 'token': 8024, 'token_str': ','}
{'sequence': '中 国 网', 'score': 0.006388372275978327, 'token': 5381, 'token_str': '网'}
{'sequence': '中 国,', 'score': 0.005788101349025965, 'token': 117, 'token_str': ','}

机器[MASK]:

{'sequence': '机 器 。', 'score': 0.2849466800689697, 'token': 511, 'token_str': '。'}
{'sequence': '机 器 :', 'score': 0.21833810210227966, 'token': 8038, 'token_str': ':'}
{'sequence': '机 器 ;', 'score': 0.13236992061138153, 'token': 8039, 'token_str': ';'}
{'sequence': '机 器 :', 'score': 0.08217491209506989, 'token': 131, 'token_str': ':'}
{'sequence': '机 器 人', 'score': 0.028695881366729736, 'token': 782, 'token_str': '人'}
{'sequence': '机 器 )', 'score': 0.02431340701878071, 'token': 8021, 'token_str': ')'}
{'sequence': '机 器 ;', 'score': 0.023457376286387444, 'token': 132, 'token_str': ';'}
{'sequence': '机 器 的', 'score': 0.012613171711564064, 'token': 4638, 'token_str': '的'}
{'sequence': '机 器 、', 'score': 0.010766545310616493, 'token': 510, 'token_str': '、'}
{'sequence': '机 器 (', 'score': 0.010289286263287067, 'token': 8020, 'token_str': '('}

凭经验,如果前缀是"中国", 下一个字是"人"应该概率更高,为什么这样实验的结果会出现很多标点符号?

yooopan commented 3 years ago

想做next word prediction,思路是什么?

stale[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] commented 3 years ago

Closing the issue, since no updates observed. Feel free to re-open if you need any further assistance.