少样本 few-shot 模型对实体ner识别时 predict报错

panhustar commented 3 months ago

Describe the bug

A clear and concise description of what the bug is. 训练生成的模型如下 ner\few-shot\model\mit-movie_3_5e-05\best_model.pth 执行predict时，报错信息如下 Traceback (most recent call last): File "predict.py", line 102, in main trainer.predict() File "C:\ProgramData\anaconda3\envs\deepke\lib\site-packages\deepke-2.2.7-py3.8.egg\deepke\name_entity_re\few_shot\module\train.py", line 135, in predict self.model.load_state_dict(torch.load(self.load_path)) File "C:\ProgramData\anaconda3\envs\deepke\lib\site-packages\torch\nn\modules\module.py", line 1497, in load_state_dict raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format( RuntimeError: Error(s) in loading state_dict for PromptGeneratorModel: Unexpected key(s) in state_dict: "prompt_model.prompt_decoder.averge_weights.0", "decoder.averge_weights.0". size mismatch for prompt_model.prompt_decoder.mapping: copying a param with shape torch.Size([14]) from checkpoint, the shape in current model is torch.Size([6]). size mismatch for decoder.mapping: copying a param with shape torch.Size([14]) from checkpoint, the shape in current model is torch.Size([6]).

Environment (please complete the following information):

OS: window
Python Version [3.8]

Screenshots

If applicable, add screenshots to help explain your problem.

Additional context

Add any other context about the problem here.

请帮忙指教

flow3rdown commented 3 months ago

您好，请试一下设置learn_weight为False

zxlzr commented 2 months ago

请问您的问题解决了吗

panhustar commented 2 months ago

您好！还没有解决，还是报以下错误 Traceback (most recent call last): File "predict.py", line 102, in main trainer.predict() File "C:\ProgramData\anaconda3\envs\deepke\lib\site-packages\deepke-2.2.7-py3.8.egg\deepke\name_entity_re\few_shot\module\train.py", line 135, in predict self.model.load_state_dict(torch.load(self.load_path)) File "C:\ProgramData\anaconda3\envs\deepke\lib\site-packages\torch\nn\modules\module.py", line 1497, in load_state_dict raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format( RuntimeError: Error(s) in loading state_dict for PromptGeneratorModel: Unexpected key(s) in state_dict: "prompt_model.prompt_decoder.averge_weights.0", "decoder.averge_weights.0". size mismatch for prompt_model.prompt_encoder.bart_encoder.embed_tokens.weight: copying a param with shape torch.Size([50270, 1024]) from checkpoint, the shape in current model is torch.Size([50274, 1024]). size mismatch for prompt_model.prompt_decoder.mapping: copying a param with shape torch.Size([14]) from checkpoint, the shape in current model is torch.Size([6]). size mismatch for prompt_model.prompt_decoder.bart_decoder.embed_tokens.weight: copying a param with shape torch.Size([50270, 1024]) from checkpoint, the shape in current model is torch.Size([50274, 1024]). size mismatch for decoder.mapping: copying a param with shape torch.Size([14]) from checkpoint, the shape in current model is torch.Size([6]). size mismatch for decoder.bart_decoder.embed_tokens.weight: copying a param with shape torch.Size([50270, 1024]) from checkpoint, the shape in current model is torch.Size([50274, 1024]).

Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.

具体修改的配置如下： \DeepKE\example\ner\few-shot\conf\predict.yaml cwd: ???

seed: 1

bart_name: "C://Support//work//model//bart-large" dataset_name: conll2003 device: cpu

num_epochs: 30 batch_size: 16 learning_rate: 2e-5 warmup_ratio: 0.01 eval_begin_epoch: 16 src_seq_ratio: 0.6 tgt_max_len: 10 num_beams: 1 length_penalty: 1

use_prompt: True prompt_len: 10 prompt_dim: 800

freeze_plm: True learn_weights: False notes: '' save_path: null # 模型保存路径 load_path: "C://Support//code//DeepKE//example//ner//few-shot//model//mit-movie_3_5e-05//best_model.pth" # 模型加载路径，不能为空 write_path: "C://Support//code//DeepKE//example//ner//few-shot//data//conll2003//predict.txt"

另外一个配置文件如下： C:\Support\code\DeepKE\example\ner\few-shot\conf\train\conll.yaml seed: 1

bart_name: "C://Support//work//model//bart-large" dataset_name: conll2003 device: cpu

num_epochs: 30 batch_size: 16 learning_rate: 2e-5 warmup_ratio: 0.01 eval_begin_epoch: 16 src_seq_ratio: 0.6 tgt_max_len: 10 num_beams: 1 length_penalty: 1

use_prompt: True prompt_len: 10 prompt_dim: 800

freeze_plm: True learn_weights: False save_path: save path # 模型保存路径 load_path: null notes: ''

panhustar commented 2 months ago

请老师帮忙看看到底是什么原因导致的，还能修改哪些内容，谢谢

flow3rdown commented 2 months ago

请检查一下预测和训练的dataset_name是否是一致的

zxlzr commented 2 months ago

请问您的问题解决了吗

zjunlp / DeepKE

少样本 few-shot 模型对实体ner识别时 predict报错 #547