Open xuezhizeng opened 9 months ago
Disable checkpoint resuming will work
Thanks Heng for your reply.
Yep, I know disable the checkpoint resuming can work, but wondering how can we resume the checkpoint (as it is a very important feature in some situtations) ? Thanks in advance if you could help with the bug?
Clone and revise line 451 in instructor_template.py, i.e., model.load_state_dict(..., strict=False)
Setting "strict=False", it works well now. Thank you so much!
Version See the console output for PyABSA, Torch, Transformers Version
PyABSA: 2.4.0 Torch: 2.0.1 transformers: 4.31.0
Describe the bug A clear and concise description of what the bug is.
In the below sample code under the folder of examples-v2/aspect_term_extraction
trainer = ATEPC.ATEPCTrainer( config=config, dataset=dataset, from_checkpoint="english", # if you want to resume training from our pretrained checkpoints, you can pass the checkpoint name here auto_device=DeviceTypeOption.AUTO, # use cuda if available checkpoint_save_mode=ModelSaveOption.SAVE_MODEL_STATE_DICT, # save state dict only instead of the whole model load_aug=False, # there are some augmentation dataset for integrated datasets, you use them by setting load_aug=True to improve performance )
When running the above code, it will have the below error:
File /databricks/conda/lib/python3.9/site-packages/torch/nn/modules/module.py:2041, in Module.load_state_dict(self, state_dict, strict) 2036 error_msgs.insert( 2037 0, 'Missing key(s) in state_dict: {}. '.format( 2038 ', '.join('"{}"'.format(k) for k in missing_keys))) 2040 if len(error_msgs) > 0: -> 2041 raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format( 2042 self.class.name, "\n\t".join(error_msgs))) 2043 return _IncompatibleKeys(missing_keys, unexpected_keys)
RuntimeError: Error(s) in loading state_dict for FAST_LCF_ATEPC: Unexpected key(s) in state_dict: "bert4global.embeddings.position_ids".
However, if I commented the line of "from_checkpoint="english", ", then it works well.
Code To Reproduce rainer = ATEPC.ATEPCTrainer( config=config, dataset=dataset, from_checkpoint="english", # if you want to resume training from our pretrained checkpoints, you can pass the checkpoint name here auto_device=DeviceTypeOption.AUTO, # use cuda if available checkpoint_save_mode=ModelSaveOption.SAVE_MODEL_STATE_DICT, # save state dict only instead of the whole model load_aug=False, # there are some augmentation dataset for integrated datasets, you use them by setting load_aug=True to improve performance )
Expected behavior Could you please troubleshoot this bug? I think resume training from your pretrained checkpoints such as "english" is very important. Otherwise, if training from scratch will generate an underperformed model.
Thank you very much in advance!
The above example code is available at: https://github.com/yangheng95/PyABSA/blob/v2/examples-v2/aspect_term_extraction/Aspect_Term_Extraction.ipynb
Screenshots