Closed dwang-sflscientific closed 3 years ago
For pretraining&fine-tuning, don't understand why ground truth labels are used as [CLS] token as well.
Hi, you don't need to base the CLS token on the actual label. It's an artifact from another project, the static token is generated in the embed_data_mask function (L321).
Ah I see. Thanks for the explaianation.
Hi
For inference, the CLS token(L157 and L160 in train.py) is still basing on ground-truth label, should they be static CLS token instead?