guotong1988 / NL2SQL-RULE

Content Enhanced BERT-based Text-to-SQL Generation https://arxiv.org/abs/1910.07179
188 stars 48 forks source link

ERROR when running train (test method only) #10

Closed mellahysf closed 4 years ago

mellahysf commented 4 years ago

Hi,

Can you help please ??

when i try to run train.py precisaly the test method, it gives me the error below:

BERT-type: uncased_L-24_H-1024_A-16 Batch_size = 32 BERT parameters: learning rate: 1e-05 Fine-tune BERT: False vocab size: 30522 hidden_size: 1024 num_hidden_layer: 24 num_attention_heads: 16 hidden_act: gelu intermediate_size: 4096 hidden_dropout_prob: 0.1 attention_probs_dropout_prob: 0.1 max_position_embeddings: 512 type_vocab_size: 2 initializer_range: 0.02 Load pre-trained parameters. Seq-to-SQL: the number of final BERT layers to be used: 2 Seq-to-SQL: the size of hidden dimension = 100 Seq-to-SQL: LSTM encoding layer size = 2 Seq-to-SQL: dropout rate = 0.3 Seq-to-SQL: learning rate = 0.001 Traceback (most recent call last): File "train.py", line 709, in path_model_bert=path_model_bert, path_model=path_model) File "train.py", line 187, in get_models model_bert.load_state_dict(res['model_bert']) File "/home/ysfmell/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 830, in load_state_dict self.class.name, "\n\t".join(error_msgs))) RuntimeError: Error(s) in loading state_dict for BertModel: Missing key(s) in state_dict: "encoder.layer.12.attention.self.query.weight", "encoder.layer.12.attention.self.query.bias", "encoder.layer.12.attention.self.key.weight", "encoder.layer.12.attention.self.key.bias", "encoder.layer.12.attention.self.value.weight", "encoder.layer.12.attention.self.value.bias", "encoder.layer.12.attention.output.dense.weight", "encoder.layer.12.attention.output.dense.bias", "encoder.layer.12.attention.output.LayerNorm.gamma", "encoder.layer.12.attention.output.LayerNorm.beta", "encoder.layer.12.intermediate.dense.weight", "encoder.layer.12.intermediate.dense.bias", "encoder.layer.12.output.dense.weight", "encoder.layer.12.output.dense.bias", "encoder.layer.12.output.LayerNorm.gamma", "encoder.layer.12.output.LayerNorm.beta", "encoder.layer.13.attention.self.query.weight", "encoder.layer.13.attention.self.query.bias", "encoder.layer.13.attention.self.key.weight", "encoder.layer.13.attention.self.key.bias", "encoder.layer.13.attention.self.value.weight", "encoder.layer.13.attention.self.value.bias", "encoder.layer.13.attention.output.dense.weight", "encoder.layer.13.attention.output.dense.bias", "encoder.layer.13.attention.output.LayerNorm.gamma", "encoder.layer.13.attention.output.LayerNorm.beta", "encoder.layer.13.intermediate.dense.weight", "encoder.layer.13.intermediate.dense.bias", "encoder.layer.13.output.dense.weight", "encoder.layer.13.output.dense.bias", "encoder.layer.13.output.LayerNorm.gamma", "encoder.layer.13.output.LayerNorm.beta", "encoder.layer.14.attention.self.query.weight", "encoder.layer.14.attention.self.query.bias", "encoder.layer.14.attention.self.key.weight", "encoder.layer.14.attention.self.key.bias", "encoder.layer.14.attention.self.value.weight", "encoder.layer.14.attention.self.value.bias", "encoder.layer.14.attention.output.dense.weight", "encoder.layer.14.attention.output.dense.bias", "encoder.layer.14.attention.output.LayerNorm.gamma", "encoder.layer.14.attention.output.LayerNorm.beta", "encoder.layer.14.intermediate.dense.weight", "encoder.layer.14.intermediate.dense.bias", "encoder.layer.14.output.dense.weight", "encoder.layer.14.output.dense.bias", "encoder.layer.14.output.LayerNorm.gamma", "encoder.layer.14.output.LayerNorm.beta", "encoder.layer.15.attention.self.query.weight", "encoder.layer.15.attention.self.query.bias", "encoder.layer.15.attention.self.key.weight", "encoder.layer.15.attention.self.key.bias", "encoder.layer.15.attention.self.value.weight", "encoder.layer.15.attention.self.value.bias", "encoder.layer.15.attention.output.dense.weight", "encoder.layer.15.attention.output.dense.bias", "encoder.layer.15.attention.output.LayerNorm.gamma", "encoder.layer.15.attention.output.LayerNorm.beta", "encoder.layer.15.intermediate.dense.weight", "encoder.layer.15.intermediate.dense.bias", "encoder.layer.15.output.dense.weight", "encoder.layer.15.output.dense.bias", "encoder.layer.15.output.LayerNorm.gamma", "encoder.layer.15.output.LayerNorm.beta", "encoder.layer.16.attention.self.query.weight", "encoder.layer.16.attention.self.query.bias", "encoder.layer.16.attention.self.key.weight", "encoder.layer.16.attention.self.key.bias", "encoder.layer.16.attention.self.value.weight", "encoder.layer.16.attention.self.value.bias", "encoder.layer.16.attention.output.dense.weight", "encoder.layer.16.attention.output.dense.bias", "encoder.layer.16.attention.output.LayerNorm.gamma", "encoder.layer.16.attention.output.LayerNorm.beta", "encoder.layer.16.intermediate.dense.weight", "encoder.layer.16.intermediate.dense.bias", "encoder.layer.16.output.dense.weight", "encoder.layer.16.output.dense.bias", "encoder.layer.16.output.LayerNorm.gamma", "encoder.layer.16.output.LayerNorm.beta", "encoder.layer.17.attention.self.query.weight", "encoder.layer.17.attention.self.query.bias", "encoder.layer.17.attention.self.key.weight", "encoder.layer.17.attention.self.key.bias", "encoder.layer.17.attention.self.value.weight", "encoder.layer.17.attention.self.value.bias", "encoder.layer.17.attention.output.dense.weight", "encoder.layer.17.attention.output.dense.bias", "encoder.layer.17.attention.output.LayerNorm.gamma", "encoder.layer.17.attention.output.LayerNorm.beta", "encoder.layer.17.intermediate.dense.weight", "encoder.layer.17.intermediate.dense.bias", "encoder.layer.17.output.dense.weight", "encoder.layer.17.output.dense.bias", "encoder.layer.17.output.LayerNorm.gamma", "encoder.layer.17.output.LayerNorm.beta", "encoder.layer.18.attention.self.query.weight", "encoder.layer.18.attention.self.query.bias", "encoder.layer.18.attention.self.key.weight", "encoder.layer.18.attention.self.key.bias", "encoder.layer.18.attention.self.value.weight", "encoder.layer.18.attention.self.value.bias", "encoder.layer.18.attention.output.dense.weight", "encoder.layer.18.attention.output.dense.bias", "encoder.layer.18.attention.output.LayerNorm.gamma", "encoder.layer.18.attention.output.LayerNorm.beta", "encoder.layer.18.intermediate.dense.weight", "encoder.layer.18.intermediate.dense.bias", "encoder.layer.18.output.dense.weight", "encoder.layer.18.output.dense.bias", "encoder.layer.18.output.LayerNorm.gamma", "encoder.layer.18.output.LayerNorm.beta", "encoder.layer.19.attention.self.query.weight", "encoder.layer.19.attention.self.query.bias", "encoder.layer.19.attention.self.key.weight", "encoder.layer.19.attention.self.key.bias", "encoder.layer.19.attention.self.value.weight", "encoder.layer.19.attention.self.value.bias", "encoder.layer.19.attention.output.dense.weight", "encoder.layer.19.attention.output.dense.bias", "encoder.layer.19.attention.output.LayerNorm.gamma", "encoder.layer.19.attention.output.LayerNorm.beta", "encoder.layer.19.intermediate.dense.weight", "encoder.layer.19.intermediate.dense.bias", "encoder.layer.19.output.dense.weight", "encoder.layer.19.output.dense.bias", "encoder.layer.19.output.LayerNorm.gamma", "encoder.layer.19.output.LayerNorm.beta", "encoder.layer.20.attention.self.query.weight", "encoder.layer.20.attention.self.query.bias", "encoder.layer.20.attention.self.key.weight", "encoder.layer.20.attention.self.key.bias", "encoder.layer.20.attention.self.value.weight", "encoder.layer.20.attention.self.value.bias", "encoder.layer.20.attention.output.dense.weight", "encoder.layer.20.attention.output.dense.bias", "encoder.layer.20.attention.output.LayerNorm.gamma", "encoder.layer.20.attention.output.LayerNorm.beta", "encoder.layer.20.intermediate.dense.weight", "encoder.layer.20.intermediate.dense.bias", "encoder.layer.20.output.dense.weight", "encoder.layer.20.output.dense.bias", "encoder.layer.20.output.LayerNorm.gamma", "encoder.layer.20.output.LayerNorm.beta", "encoder.layer.21.attention.self.query.weight", "encoder.layer.21.attention.self.query.bias", "encoder.layer.21.attention.self.key.weight", "encoder.layer.21.attention.self.key.bias", "encoder.layer.21.attention.self.value.weight", "encoder.layer.21.attention.self.value.bias", "encoder.layer.21.attention.output.dense.weight", "encoder.layer.21.attention.output.dense.bias", "encoder.layer.21.attention.output.LayerNorm.gamma", "encoder.layer.21.attention.output.LayerNorm.beta", "encoder.layer.21.intermediate.dense.weight", "encoder.layer.21.intermediate.dense.bias", "encoder.layer.21.output.dense.weight", "encoder.layer.21.output.dense.bias", "encoder.layer.21.output.LayerNorm.gamma", "encoder.layer.21.output.LayerNorm.beta", "encoder.layer.22.attention.self.query.weight", "encoder.layer.22.attention.self.query.bias", "encoder.layer.22.attention.self.key.weight", "encoder.layer.22.attention.self.key.bias", "encoder.layer.22.attention.self.value.weight", "encoder.layer.22.attention.self.value.bias", "encoder.layer.22.attention.output.dense.weight", "encoder.layer.22.attention.output.dense.bias", "encoder.layer.22.attention.output.LayerNorm.gamma", "encoder.layer.22.attention.output.LayerNorm.beta", "encoder.layer.22.intermediate.dense.weight", "encoder.layer.22.intermediate.dense.bias", "encoder.layer.22.output.dense.weight", "encoder.layer.22.output.dense.bias", "encoder.layer.22.output.LayerNorm.gamma", "encoder.layer.22.output.LayerNorm.beta", "encoder.layer.23.attention.self.query.weight", "encoder.layer.23.attention.self.query.bias", "encoder.layer.23.attention.self.key.weight", "encoder.layer.23.attention.self.key.bias", "encoder.layer.23.attention.self.value.weight", "encoder.layer.23.attention.self.value.bias", "encoder.layer.23.attention.output.dense.weight", "encoder.layer.23.attention.output.dense.bias", "encoder.layer.23.attention.output.LayerNorm.gamma", "encoder.layer.23.attention.output.LayerNorm.beta", "encoder.layer.23.intermediate.dense.weight", "encoder.layer.23.intermediate.dense.bias", "encoder.layer.23.output.dense.weight", "encoder.layer.23.output.dense.bias", "encoder.layer.23.output.LayerNorm.gamma", "encoder.layer.23.output.LayerNorm.beta". size mismatch for embeddings.word_embeddings.weight: copying a param with shape torch.Size([30522, 768]) from checkpoint, the shape in current model is torch.Size([30522, 1024]). size mismatch for embeddings.position_embeddings.weight: copying a param with shape torch.Size([512, 768]) from checkpoint, the shape in current model is torch.Size([512, 1024]). size mismatch for embeddings.token_type_embeddings.weight: copying a param with shape torch.Size([2, 768]) from checkpoint, the shape in current model is torch.Size([2, 1024]). size mismatch for embeddings.LayerNorm.gamma: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for embeddings.LayerNorm.beta: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for encoder.layer.0.attention.self.query.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]). size mismatch for encoder.layer.0.attention.self.query.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for encoder.layer.0.attention.self.key.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]). size mismatch for encoder.layer.0.attention.self.key.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for encoder.layer.0.attention.self.value.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]). size mismatch for encoder.layer.0.attention.self.value.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for encoder.layer.0.attention.output.dense.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]). size mismatch for encoder.layer.0.attention.output.dense.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for encoder.layer.0.attention.output.LayerNorm.gamma: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for encoder.layer.0.attention.output.LayerNorm.beta: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for encoder.layer.0.intermediate.dense.weight: copying a param with shape torch.Size([3072, 768]) from checkpoint, the shape in current model is torch.Size([4096, 1024]). size mismatch for encoder.layer.0.intermediate.dense.bias: copying a param with shape torch.Size([3072]) from checkpoint, the shape in current model is torch.Size([4096]). size mismatch for encoder.layer.0.output.dense.weight: copying a param with shape torch.Size([768, 3072]) from checkpoint, the shape in current model is torch.Size([1024, 4096]). size mismatch for encoder.layer.0.output.dense.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for encoder.layer.0.output.LayerNorm.gamma: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for encoder.layer.0.output.LayerNorm.beta: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for encoder.layer.1.attention.self.query.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]). size mismatch for encoder.layer.1.attention.self.query.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for encoder.layer.1.attention.self.key.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]). size mismatch for encoder.layer.1.attention.self.key.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for encoder.layer.1.attention.self.value.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]). size mismatch for encoder.layer.1.attention.self.value.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for encoder.layer.1.attention.output.dense.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]). size mismatch for encoder.layer.1.attention.output.dense.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for encoder.layer.1.attention.output.LayerNorm.gamma: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for encoder.layer.1.attention.output.LayerNorm.beta: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for encoder.layer.1.intermediate.dense.weight: copying a param with shape torch.Size([3072, 768]) from checkpoint, the shape in current model is torch.Size([4096, 1024]). size mismatch for encoder.layer.1.intermediate.dense.bias: copying a param with shape torch.Size([3072]) from checkpoint, the shape in current model is torch.Size([4096]). size mismatch for encoder.layer.1.output.dense.weight: copying a param with shape torch.Size([768, 3072]) from checkpoint, the shape in current model is torch.Size([1024, 4096]). size mismatch for encoder.layer.1.output.dense.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for encoder.layer.1.output.LayerNorm.gamma: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for encoder.layer.1.output.LayerNorm.beta: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for encoder.layer.2.attention.self.query.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]). size mismatch for encoder.layer.2.attention.self.query.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for encoder.layer.2.attention.self.key.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]). size mismatch for encoder.layer.2.attention.self.key.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for encoder.layer.2.attention.self.value.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]). size mismatch for encoder.layer.2.attention.self.value.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for encoder.layer.2.attention.output.dense.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]). size mismatch for encoder.layer.2.attention.output.dense.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for encoder.layer.2.attention.output.LayerNorm.gamma: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for encoder.layer.2.attention.output.LayerNorm.beta: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for encoder.layer.2.intermediate.dense.weight: copying a param with shape torch.Size([3072, 768]) from checkpoint, the shape in current model is torch.Size([4096, 1024]). size mismatch for encoder.layer.2.intermediate.dense.bias: copying a param with shape torch.Size([3072]) from checkpoint, the shape in current model is torch.Size([4096]). size mismatch for encoder.layer.2.output.dense.weight: copying a param with shape torch.Size([768, 3072]) from checkpoint, the shape in current model is torch.Size([1024, 4096]). size mismatch for encoder.layer.2.output.dense.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for encoder.layer.2.output.LayerNorm.gamma: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for encoder.layer.2.output.LayerNorm.beta: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for encoder.layer.3.attention.self.query.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]). size mismatch for encoder.layer.3.attention.self.query.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for encoder.layer.3.attention.self.key.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]). size mismatch for encoder.layer.3.attention.self.key.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for encoder.layer.3.attention.self.value.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]). size mismatch for encoder.layer.3.attention.self.value.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for encoder.layer.3.attention.output.dense.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]). size mismatch for encoder.layer.3.attention.output.dense.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for encoder.layer.3.attention.output.LayerNorm.gamma: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for encoder.layer.3.attention.output.LayerNorm.beta: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for encoder.layer.3.intermediate.dense.weight: copying a param with shape torch.Size([3072, 768]) from checkpoint, the shape in current model is torch.Size([4096, 1024]). size mismatch for encoder.layer.3.intermediate.dense.bias: copying a param with shape torch.Size([3072]) from checkpoint, the shape in current model is torch.Size([4096]). size mismatch for encoder.layer.3.output.dense.weight: copying a param with shape torch.Size([768, 3072]) from checkpoint, the shape in current model is torch.Size([1024, 4096]). size mismatch for encoder.layer.3.output.dense.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for encoder.layer.3.output.LayerNorm.gamma: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for encoder.layer.3.output.LayerNorm.beta: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for encoder.layer.4.attention.self.query.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]). size mismatch for encoder.layer.4.attention.self.query.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for encoder.layer.4.attention.self.key.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]). size mismatch for encoder.layer.4.attention.self.key.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for encoder.layer.4.attention.self.value.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]). size mismatch for encoder.layer.4.attention.self.value.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for encoder.layer.4.attention.output.dense.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]). size mismatch for encoder.layer.4.attention.output.dense.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for encoder.layer.4.attention.output.LayerNorm.gamma: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for encoder.layer.4.attention.output.LayerNorm.beta: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for encoder.layer.4.intermediate.dense.weight: copying a param with shape torch.Size([3072, 768]) from checkpoint, the shape in current model is torch.Size([4096, 1024]). size mismatch for encoder.layer.4.intermediate.dense.bias: copying a param with shape torch.Size([3072]) from checkpoint, the shape in current model is torch.Size([4096]). size mismatch for encoder.layer.4.output.dense.weight: copying a param with shape torch.Size([768, 3072]) from checkpoint, the shape in current model is torch.Size([1024, 4096]). size mismatch for encoder.layer.4.output.dense.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for encoder.layer.4.output.LayerNorm.gamma: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for encoder.layer.4.output.LayerNorm.beta: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for encoder.layer.5.attention.self.query.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]). size mismatch for encoder.layer.5.attention.self.query.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for encoder.layer.5.attention.self.key.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]). size mismatch for encoder.layer.5.attention.self.key.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for encoder.layer.5.attention.self.value.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]). size mismatch for encoder.layer.5.attention.self.value.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for encoder.layer.5.attention.output.dense.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]). size mismatch for encoder.layer.5.attention.output.dense.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for encoder.layer.5.attention.output.LayerNorm.gamma: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for encoder.layer.5.attention.output.LayerNorm.beta: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for encoder.layer.5.intermediate.dense.weight: copying a param with shape torch.Size([3072, 768]) from checkpoint, the shape in current model is torch.Size([4096, 1024]). size mismatch for encoder.layer.5.intermediate.dense.bias: copying a param with shape torch.Size([3072]) from checkpoint, the shape in current model is torch.Size([4096]). size mismatch for encoder.layer.5.output.dense.weight: copying a param with shape torch.Size([768, 3072]) from checkpoint, the shape in current model is torch.Size([1024, 4096]). size mismatch for encoder.layer.5.output.dense.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for encoder.layer.5.output.LayerNorm.gamma: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for encoder.layer.5.output.LayerNorm.beta: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for encoder.layer.6.attention.self.query.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]). size mismatch for encoder.layer.6.attention.self.query.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for encoder.layer.6.attention.self.key.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]). size mismatch for encoder.layer.6.attention.self.key.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for encoder.layer.6.attention.self.value.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]). size mismatch for encoder.layer.6.attention.self.value.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for encoder.layer.6.attention.output.dense.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]). size mismatch for encoder.layer.6.attention.output.dense.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for encoder.layer.6.attention.output.LayerNorm.gamma: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for encoder.layer.6.attention.output.LayerNorm.beta: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for encoder.layer.6.intermediate.dense.weight: copying a param with shape torch.Size([3072, 768]) from checkpoint, the shape in current model is torch.Size([4096, 1024]). size mismatch for encoder.layer.6.intermediate.dense.bias: copying a param with shape torch.Size([3072]) from checkpoint, the shape in current model is torch.Size([4096]). size mismatch for encoder.layer.6.output.dense.weight: copying a param with shape torch.Size([768, 3072]) from checkpoint, the shape in current model is torch.Size([1024, 4096]). size mismatch for encoder.layer.6.output.dense.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for encoder.layer.6.output.LayerNorm.gamma: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for encoder.layer.6.output.LayerNorm.beta: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for encoder.layer.7.attention.self.query.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]). size mismatch for encoder.layer.7.attention.self.query.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for encoder.layer.7.attention.self.key.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]). size mismatch for encoder.layer.7.attention.self.key.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for encoder.layer.7.attention.self.value.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]). size mismatch for encoder.layer.7.attention.self.value.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for encoder.layer.7.attention.output.dense.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]). size mismatch for encoder.layer.7.attention.output.dense.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for encoder.layer.7.attention.output.LayerNorm.gamma: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for encoder.layer.7.attention.output.LayerNorm.beta: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for encoder.layer.7.intermediate.dense.weight: copying a param with shape torch.Size([3072, 768]) from checkpoint, the shape in current model is torch.Size([4096, 1024]). size mismatch for encoder.layer.7.intermediate.dense.bias: copying a param with shape torch.Size([3072]) from checkpoint, the shape in current model is torch.Size([4096]). size mismatch for encoder.layer.7.output.dense.weight: copying a param with shape torch.Size([768, 3072]) from checkpoint, the shape in current model is torch.Size([1024, 4096]). size mismatch for encoder.layer.7.output.dense.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for encoder.layer.7.output.LayerNorm.gamma: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for encoder.layer.7.output.LayerNorm.beta: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for encoder.layer.8.attention.self.query.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]). size mismatch for encoder.layer.8.attention.self.query.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for encoder.layer.8.attention.self.key.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]). size mismatch for encoder.layer.8.attention.self.key.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for encoder.layer.8.attention.self.value.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]). size mismatch for encoder.layer.8.attention.self.value.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for encoder.layer.8.attention.output.dense.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]). size mismatch for encoder.layer.8.attention.output.dense.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for encoder.layer.8.attention.output.LayerNorm.gamma: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for encoder.layer.8.attention.output.LayerNorm.beta: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for encoder.layer.8.intermediate.dense.weight: copying a param with shape torch.Size([3072, 768]) from checkpoint, the shape in current model is torch.Size([4096, 1024]). size mismatch for encoder.layer.8.intermediate.dense.bias: copying a param with shape torch.Size([3072]) from checkpoint, the shape in current model is torch.Size([4096]). size mismatch for encoder.layer.8.output.dense.weight: copying a param with shape torch.Size([768, 3072]) from checkpoint, the shape in current model is torch.Size([1024, 4096]). size mismatch for encoder.layer.8.output.dense.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for encoder.layer.8.output.LayerNorm.gamma: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for encoder.layer.8.output.LayerNorm.beta: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for encoder.layer.9.attention.self.query.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]). size mismatch for encoder.layer.9.attention.self.query.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for encoder.layer.9.attention.self.key.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]). size mismatch for encoder.layer.9.attention.self.key.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for encoder.layer.9.attention.self.value.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]). size mismatch for encoder.layer.9.attention.self.value.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for encoder.layer.9.attention.output.dense.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]). size mismatch for encoder.layer.9.attention.output.dense.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for encoder.layer.9.attention.output.LayerNorm.gamma: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for encoder.layer.9.attention.output.LayerNorm.beta: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for encoder.layer.9.intermediate.dense.weight: copying a param with shape torch.Size([3072, 768]) from checkpoint, the shape in current model is torch.Size([4096, 1024]). size mismatch for encoder.layer.9.intermediate.dense.bias: copying a param with shape torch.Size([3072]) from checkpoint, the shape in current model is torch.Size([4096]). size mismatch for encoder.layer.9.output.dense.weight: copying a param with shape torch.Size([768, 3072]) from checkpoint, the shape in current model is torch.Size([1024, 4096]). size mismatch for encoder.layer.9.output.dense.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for encoder.layer.9.output.LayerNorm.gamma: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for encoder.layer.9.output.LayerNorm.beta: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for encoder.layer.10.attention.self.query.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]). size mismatch for encoder.layer.10.attention.self.query.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for encoder.layer.10.attention.self.key.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]). size mismatch for encoder.layer.10.attention.self.key.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for encoder.layer.10.attention.self.value.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]). size mismatch for encoder.layer.10.attention.self.value.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for encoder.layer.10.attention.output.dense.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]). size mismatch for encoder.layer.10.attention.output.dense.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for encoder.layer.10.attention.output.LayerNorm.gamma: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for encoder.layer.10.attention.output.LayerNorm.beta: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for encoder.layer.10.intermediate.dense.weight: copying a param with shape torch.Size([3072, 768]) from checkpoint, the shape in current model is torch.Size([4096, 1024]). size mismatch for encoder.layer.10.intermediate.dense.bias: copying a param with shape torch.Size([3072]) from checkpoint, the shape in current model is torch.Size([4096]). size mismatch for encoder.layer.10.output.dense.weight: copying a param with shape torch.Size([768, 3072]) from checkpoint, the shape in current model is torch.Size([1024, 4096]). size mismatch for encoder.layer.10.output.dense.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for encoder.layer.10.output.LayerNorm.gamma: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for encoder.layer.10.output.LayerNorm.beta: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for encoder.layer.11.attention.self.query.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]). size mismatch for encoder.layer.11.attention.self.query.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for encoder.layer.11.attention.self.key.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]). size mismatch for encoder.layer.11.attention.self.key.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for encoder.layer.11.attention.self.value.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]). size mismatch for encoder.layer.11.attention.self.value.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for encoder.layer.11.attention.output.dense.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]). size mismatch for encoder.layer.11.attention.output.dense.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for encoder.layer.11.attention.output.LayerNorm.gamma: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for encoder.layer.11.attention.output.LayerNorm.beta: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for encoder.layer.11.intermediate.dense.weight: copying a param with shape torch.Size([3072, 768]) from checkpoint, the shape in current model is torch.Size([4096, 1024]). size mismatch for encoder.layer.11.intermediate.dense.bias: copying a param with shape torch.Size([3072]) from checkpoint, the shape in current model is torch.Size([4096]). size mismatch for encoder.layer.11.output.dense.weight: copying a param with shape torch.Size([768, 3072]) from checkpoint, the shape in current model is torch.Size([1024, 4096]). size mismatch for encoder.layer.11.output.dense.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for encoder.layer.11.output.LayerNorm.gamma: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for encoder.layer.11.output.LayerNorm.beta: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for pooler.dense.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]). size mismatch for pooler.dense.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).

kumarvivek9097 commented 4 years ago

Hi Mellahysf, I'm facing the same problem. Did you solve it? If yes, How?