run Reporter app with error : r._hidden2tag.linear.weight: param with shape torch.Size from checkpoint does not match the shape in current model is torch.Size

toutoutout commented 6 months ago

Hello,

I trained the model with run_stg_joint.sh. After i run demo_pipeline.py, while receving an error. I paste the full error here:

[jupyter@jupyter-d134f2d8-ead8-4a47-92a0-8dcb5293de93-54b4cfd585-jh5dl STG-correction]$ python demo_pipeline.py jieba are not installed, use default mode. Some weights of the model checkpoint at ../pretrained-models/hflchinese-roberta-wwm-ext/ were not used when initializing BertForMaskedLM: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight', 'cls.seq_relationship.bias', 'cls.seq_relationship.weight']

This IS expected if you are initializing BertForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
This IS NOT expected if you are initializing BertForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model). Traceback (most recent call last): File "/jupyter/workspace/bert/FCGEC-main/model/STG-correction/demo_pipeline.py", line 27, in pipecls = Pipeline(args_binary, args_demo) File "/jupyter/workspace/bert/FCGEC-main/model/STG-correction/app/Pipeline.py", line 18, in init self.model_bucket = ModelBucketV1(args_demo, self.device, binary=True, switch=True, taggen=True, checkpoints_name='checkpoint.pt') File "/jupyter/workspace/bert/FCGEC-main/model/STG-correction/app/ModelBucket.py", line 35, in init joit_model.load_state_dict(joit_model_params) File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 2041, in load_state_dict raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format( RuntimeError: Error(s) in loading state_dict for JointModel: size mismatch for tagger._hidden2tag.linear.weight: copying a param with shape torch.Size([7, 768]) from checkpoint, the shape in current model is torch.Size([6, 768]). size mismatch for tagger._hidden2tag.linear.bias: copying a param with shape torch.Size([7]) from checkpoint, the shape in current model is torch.Size([6]).

please give some hint to solve it. Thanks.

xlxwalex commented 6 months ago

Hi,

If you are training from scratch under the Joint paradigm, the configuration file used for training is joint_config.py. Please check if your settings (tagger_classes-line 40) are consistent with those in the repository.

toutoutout commented 6 months ago

-->MAX_GENERATE is set to 6 in run_stg_joint.sh -->model_args.add_arg('max_generate', int, 6, 'Number of Max Token Generation') in evaluate_joint_config.py -->model_args.add_arg('max_generate', int, 6, 'Number of Max Token Generation') in joint_config.py -->model_args.add_arg('max_generate', int, 6, 'Number of Max Token Generation') in generator_config.py -->model_args.add_arg('max_generate', int, 6, 'Number of Max Token Generation') in evaluate_indep_config.py -->model_args.add_arg('max_generate', int, 6, 'Number of Max Token Generation') in tagger_config.py -->self.max_token = args.max_generate + 1 --- line 16 in tagger_model(else if self.max_token = args.max_generate, the error will be size mismatch for tagger._hidden2tag.linear.weight: copying a param with shape torch.Size([7, 768]) from checkpoint, the shape in current model is torch.Size([6, 768]). size mismatch for tagger._hidden2tag.linear.bias: copying a param with shape torch.Size([7]) from checkpoint, the shape in current model is torch.Size([6]). size mismatch for tagger._hidden2t.linear.weight: copying a param with shape torch.Size([7, 768]) from checkpoint, the shape in current model is torch.Size([8, 768]). size mismatch for tagger._hidden2t.linear.bias: copying a param with shape torch.Size([7]) from checkpoint, the shape in current model is torch.Size([8]).)

xlxwalex commented 6 months ago

For error message:

size mismatch for tagger._hidden2tag.linear.bias: copying a param with shape torch.Size([7]) from checkpoint, the shape in current model is torch.Size([6]).

You need to check the information in the conversation above:

Hi,

If you are training from scratch under the Joint paradigm, the configuration file used for training is joint_config.py. Please check if your settings (tagger_classes-line 40) are consistent with those in the repository.

The tagger_classes parameter you trained with also seems to be set to 7 during training. You may need to check or retrain it.

xlxwalex commented 6 months ago

-->MAX_GENERATE is set to 6 in run_stg_joint.sh -->model_args.add_arg('max_generate', int, 6, 'Number of Max Token Generation') in evaluate_joint_config.py -->model_args.add_arg('max_generate', int, 6, 'Number of Max Token Generation') in joint_config.py -->model_args.add_arg('max_generate', int, 6, 'Number of Max Token Generation') in generator_config.py -->model_args.add_arg('max_generate', int, 6, 'Number of Max Token Generation') in evaluate_indep_config.py -->model_args.add_arg('max_generate', int, 6, 'Number of Max Token Generation') in tagger_config.py -->self.max_token = args.max_generate + 1 --- line 16 in tagger_model(else if self.max_token = args.max_generate, the error will be size mismatch for tagger._hidden2tag.linear.weight: copying a param with shape torch.Size([7, 768]) from checkpoint, the shape in current model is torch.Size([6, 768]). size mismatch for tagger._hidden2tag.linear.bias: copying a param with shape torch.Size([7]) from checkpoint, the shape in current model is torch.Size([6]). size mismatch for tagger._hidden2t.linear.weight: copying a param with shape torch.Size([7, 768]) from checkpoint, the shape in current model is torch.Size([8, 768]). size mismatch for tagger._hidden2t.linear.bias: copying a param with shape torch.Size([7]) from checkpoint, the shape in current model is torch.Size([8]).)

It's a bit odd; if max_generate is set to 6 in evaluate_joint_config.py, the current model's shape should not be 8, it should be 7. You may need to track the value of max_token in TaggerModel.

toutoutout commented 6 months ago

if the max_generate parameter in evaluate_joint_config.py set to 7, the error will be : size mismatch for tagger._hidden2tag.linear.weight: copying a param with shape torch.Size([7, 768]) from checkpoint, the shape in current model is torch.Size([6, 768]). size mismatch for tagger._hidden2tag.linear.bias: copying a param with shape torch.Size([7]) from checkpoint, the shape in current model is torch.Size([6]). size mismatch for tagger._hidden2t.linear.weight: copying a param with shape torch.Size([7, 768]) from checkpoint, the shape in current model is torch.Size([8, 768]). size mismatch for tagger._hidden2t.linear.bias: copying a param with shape torch.Size([7]) from checkpoint, the shape in current model is torch.Size([8]).

I'm sure the max_generate was set to 6 in joint_config.py (tagger_classes -line 40), since i set the same 6 in run_stg_joint.sh.

Yes. i need to track the value.

xlxwalex commented 6 months ago

Try setting the parameters I just sent you in the email, and also print out the max_token of the TaggerModel for tracking.

toutoutout commented 6 months ago

Thanks a lot

xlxwalex / FCGEC

run Reporter app with error : r._hidden2tag.linear.weight: param with shape torch.Size from checkpoint does not match the shape in current model is torch.Size #35