MacBERT 在 mask 上的一些细节问题

ymcui / MacBERT

Revisiting Pre-trained Models for Chinese Natural Language Processing (MacBERT)

https://www.aclweb.org/anthology/2020.findings-emnlp.58/

Apache License 2.0

645 stars 59 forks source link

MacBERT 在 mask 上的一些细节问题 #5

Closed wlhgtc closed 3 years ago

wlhgtc commented 3 years ago

想自己动手做一下 MacBERT 的 mask，有下面两个问题希望可以请教一下 @ymcui：

论文中 "We use a percentage of 15% input words for masking"，可以理解为 mask 掉 15% 的 word 而不是 token 吗？
随机采样词级别的 1，2，3，4-gram 的文本，这个采样会像 Google 的原生实现那样避免采样重复的词吗？

stale[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] commented 3 years ago

Closing the issue, since no updates observed. Feel free to re-open if you need any further assistance.