Closed bultiful closed 2 years ago
Hi,
This is normal. The BERT checkpoint is the parameters of the original BERT trained by masked language model task and next sentence prediction task. For downstream tasks, we only use the encoder of BERT and do not need the layers for mask word prediction and next sentence prediction. Consequently, some weights will not be used for downstream tasks.
Best
thx u
hi, some weights not used when training. see the picture
Some weights of the model checkpoint at ../berts/bert/pytorch_model.bin were not used when initializing WCBertCRFForTokenClassification