lonePatient / NeZha_Chinese_PyTorch

NEZHA: Neural Contextualized Representation for Chinese Language Understanding
MIT License
261 stars 53 forks source link

使用华为原本的torch代码加载本项目的nezha-wwm权重有警告 #6

Closed bestpredicts closed 3 years ago

bestpredicts commented 3 years ago
  Weights from pretrained model not used in BertForPreTraining: 
['bert.encoder.layer.0.attention.self.relative_positions_encoding.positions_encoding', 
'bert.encoder.layer.1.attention.self.relative_positions_encoding.positions_encoding', 
'bert.encoder.layer.2.attention.self.relative_positions_encoding.positions_encoding',  
'bert.encoder.layer.3.attention.self.relative_positions_encoding.positions_encoding', 
'bert.encoder.layer.4.attention.self.relative_positions_encoding.positions_encoding', 
'bert.encoder.layer.5.attention.self.relative_positions_encoding.positions_encoding',   
........
 'cls.predictions.decoder.bias']

希望能提供一下这个问题的原因

lonePatient commented 3 years ago

相对位置编码 我只是转化为一个layer,不影响啊

suolyer commented 3 years ago

我使用该项目的代码加载其提供的nezha-large-wwm预训练权重,同样出现,说相对位置编码层的预训练权重无法使用,想请问一下这是怎么回事呢?

Some weights of the model checkpoint at ./pretrained_model/nezha-large-www/ were not used when initializing NeZhaForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight', 'cls.predictions.decoder.bias', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias', 'bert.encoder.layer.0.attention.self.relative_positions_encoding.positions_encoding', 'bert.encoder.layer.1.attention.self.relative_positions_encoding.positions_encoding', 'bert.encoder.layer.2.attention.self.relative_positions_encoding.positions_encoding', 'bert.encoder.layer.3.attention.self.relative_positions_encoding.positions_encoding', 'bert.encoder.layer.4.attention.self.relative_positions_encoding.positions_encoding', 'bert.encoder.layer.5.attention.self.relative_positions_encoding.positions_encoding', 'bert.encoder.layer.6.attention.self.relative_positions_encoding.positions_encoding', 'bert.encoder.layer.7.attention.self.relative_positions_encoding.positions_encoding', 'bert.encoder.layer.8.attention.self.relative_positions_encoding.positions_encoding', 'bert.encoder.layer.9.attention.self.relative_positions_encoding.positions_encoding', 'bert.encoder.layer.10.attention.self.relative_positions_encoding.positions_encoding', 'bert.encoder.layer.11.attention.self.relative_positions_encoding.positions_encoding', 'bert.encoder.layer.12.attention.self.relative_positions_encoding.positions_encoding', 'bert.encoder.layer.13.attention.self.relative_positions_encoding.positions_encoding', 'bert.encoder.layer.14.attention.self.relative_positions_encoding.positions_encoding', 'bert.encoder.layer.15.attention.self.relative_positions_encoding.positions_encoding', 'bert.encoder.layer.16.attention.self.relative_positions_encoding.positions_encoding', 'bert.encoder.layer.17.attention.self.relative_positions_encoding.positions_encoding', 'bert.encoder.layer.18.attention.self.relative_positions_encoding.positions_encoding', 'bert.encoder.layer.19.attention.self.relative_positions_encoding.positions_encoding', 'bert.encoder.layer.20.attention.self.relative_positions_encoding.positions_encoding', 'bert.encoder.layer.21.attention.self.relative_positions_encoding.positions_encoding', 'bert.encoder.layer.22.attention.self.relative_positions_encoding.positions_encoding', 'bert.encoder.layer.23.attention.self.relative_positions_encoding.positions_encoding']
- This IS expected if you are initializing NeZhaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing NeZhaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
lonePatient commented 3 years ago

代码已经更新了,主要是相对位置的问题

bestpredicts commented 3 years ago

代码已经更新了,主要是相对位置的问题

不过貌似还有这个警告 不知道原因

Some weights of the model checkpoint at /home/root1/DY/nezha-base-www were not used when initializing NeZhaForMaskedLM: ['cls.seq_relationship.weight', 'cls.seq_relationship.bias']
- This IS expected if you are initializing NeZhaForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing NeZhaForMaskedLM from the checkpoint 
lonePatient commented 3 years ago

@bestpredicts 你都使用了MLM,哪来seq_next的权重呢?