xlxwalex / FCGEC

The Corpus & Code for EMNLP 2022 paper "FCGEC: Fine-Grained Corpus for Chinese Grammatical Error Correction" | FCGEC中文语法纠错语料及STG模型
https://aclanthology.org/2022.findings-emnlp.137
Apache License 2.0
104 stars 12 forks source link

run Reporter app with error : r._hidden2tag.linear.weight: param with shape torch.Size from checkpoint does not match the shape in current model is torch.Size #35

Closed toutoutout closed 6 months ago

toutoutout commented 6 months ago

Hello,

I trained the model with run_stg_joint.sh. After i run demo_pipeline.py, while receving an error. I paste the full error here:

[jupyter@jupyter-d134f2d8-ead8-4a47-92a0-8dcb5293de93-54b4cfd585-jh5dl STG-correction]$ python demo_pipeline.py jieba are not installed, use default mode. Some weights of the model checkpoint at ../pretrained-models/hflchinese-roberta-wwm-ext/ were not used when initializing BertForMaskedLM: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight', 'cls.seq_relationship.bias', 'cls.seq_relationship.weight']

please give some hint to solve it. Thanks.

xlxwalex commented 6 months ago

Hi,

If you are training from scratch under the Joint paradigm, the configuration file used for training is joint_config.py. Please check if your settings (tagger_classes-line 40) are consistent with those in the repository.

toutoutout commented 6 months ago

-->MAX_GENERATE is set to 6 in run_stg_joint.sh -->model_args.add_arg('max_generate', int, 6, 'Number of Max Token Generation') in evaluate_joint_config.py -->model_args.add_arg('max_generate', int, 6, 'Number of Max Token Generation') in joint_config.py -->model_args.add_arg('max_generate', int, 6, 'Number of Max Token Generation') in generator_config.py -->model_args.add_arg('max_generate', int, 6, 'Number of Max Token Generation') in evaluate_indep_config.py -->model_args.add_arg('max_generate', int, 6, 'Number of Max Token Generation') in tagger_config.py -->self.max_token = args.max_generate + 1 --- line 16 in tagger_model(else if self.max_token = args.max_generate, the error will be size mismatch for tagger._hidden2tag.linear.weight: copying a param with shape torch.Size([7, 768]) from checkpoint, the shape in current model is torch.Size([6, 768]). size mismatch for tagger._hidden2tag.linear.bias: copying a param with shape torch.Size([7]) from checkpoint, the shape in current model is torch.Size([6]). size mismatch for tagger._hidden2t.linear.weight: copying a param with shape torch.Size([7, 768]) from checkpoint, the shape in current model is torch.Size([8, 768]). size mismatch for tagger._hidden2t.linear.bias: copying a param with shape torch.Size([7]) from checkpoint, the shape in current model is torch.Size([8]).)

xlxwalex commented 6 months ago

For error message:

size mismatch for tagger._hidden2tag.linear.bias: copying a param with shape torch.Size([7]) from checkpoint, the shape in current model is torch.Size([6]).

You need to check the information in the conversation above:

Hi,

If you are training from scratch under the Joint paradigm, the configuration file used for training is joint_config.py. Please check if your settings (tagger_classes-line 40) are consistent with those in the repository.

The tagger_classes parameter you trained with also seems to be set to 7 during training. You may need to check or retrain it.

xlxwalex commented 6 months ago

-->MAX_GENERATE is set to 6 in run_stg_joint.sh -->model_args.add_arg('max_generate', int, 6, 'Number of Max Token Generation') in evaluate_joint_config.py -->model_args.add_arg('max_generate', int, 6, 'Number of Max Token Generation') in joint_config.py -->model_args.add_arg('max_generate', int, 6, 'Number of Max Token Generation') in generator_config.py -->model_args.add_arg('max_generate', int, 6, 'Number of Max Token Generation') in evaluate_indep_config.py -->model_args.add_arg('max_generate', int, 6, 'Number of Max Token Generation') in tagger_config.py -->self.max_token = args.max_generate + 1 --- line 16 in tagger_model(else if self.max_token = args.max_generate, the error will be size mismatch for tagger._hidden2tag.linear.weight: copying a param with shape torch.Size([7, 768]) from checkpoint, the shape in current model is torch.Size([6, 768]). size mismatch for tagger._hidden2tag.linear.bias: copying a param with shape torch.Size([7]) from checkpoint, the shape in current model is torch.Size([6]). size mismatch for tagger._hidden2t.linear.weight: copying a param with shape torch.Size([7, 768]) from checkpoint, the shape in current model is torch.Size([8, 768]). size mismatch for tagger._hidden2t.linear.bias: copying a param with shape torch.Size([7]) from checkpoint, the shape in current model is torch.Size([8]).)

It's a bit odd; if max_generate is set to 6 in evaluate_joint_config.py, the current model's shape should not be 8, it should be 7. You may need to track the value of max_token in TaggerModel.

toutoutout commented 6 months ago

if the max_generate parameter in evaluate_joint_config.py set to 7, the error will be : size mismatch for tagger._hidden2tag.linear.weight: copying a param with shape torch.Size([7, 768]) from checkpoint, the shape in current model is torch.Size([6, 768]). size mismatch for tagger._hidden2tag.linear.bias: copying a param with shape torch.Size([7]) from checkpoint, the shape in current model is torch.Size([6]). size mismatch for tagger._hidden2t.linear.weight: copying a param with shape torch.Size([7, 768]) from checkpoint, the shape in current model is torch.Size([8, 768]). size mismatch for tagger._hidden2t.linear.bias: copying a param with shape torch.Size([7]) from checkpoint, the shape in current model is torch.Size([8]).

I'm sure the max_generate was set to 6 in joint_config.py (tagger_classes -line 40), since i set the same 6 in run_stg_joint.sh.

Yes. i need to track the value.

xlxwalex commented 6 months ago

Try setting the parameters I just sent you in the email, and also print out the max_token of the TaggerModel for tracking.

toutoutout commented 6 months ago

Thanks a lot