Open Ibrokhimsadikov opened 1 year ago
Please provide useful information according to the report: https://github.com/yangheng95/PyABSA/issues/new?assignees=&labels=&template=bug_report.md&title=
Version I installed pyabsa version 2.4.1 and torch version 1.13.1 and transformers version 4.27.2
Describe the bug
Hello, I have the same issue. I am trying to finetune your latest multlingual model on my own Arabic dataset starting from the multilingual checkpoint. I am sure the problem is not the dataset. I will paste the error log below. I get an error when I use any of the following options for config.pretrained_bert
. I also get an error (see below) when I do not set config.pretrained_bert
to any value. There is always an error about state_dict or something:
Sample data:
بصراحة O -100
أنا O -100
ما O -100
أحب O -100
الكاتب O -100
اللي O -100
يدخل O -100
اللغة B-ASP negative
العامية I-ASP negative
في O -100
كتاباته O -100
مع O -100
اني O -100
أمارس O -100
هذا O -100
الخطأ O -100
روايه B-ASP negative
حزينه O -100
قد O -100
لاتستحق O -100
عناء O -100
القراءه O -100
Code To Reproduce
import warnings
warnings.filterwarnings("ignore")
import json
import os
# os.environ["CUDA_VISIBLE_DEVICES"] = "0,1,2,3,4"
from pyabsa import ModelSaveOption, DeviceTypeOption
import findfile
from pyabsa import AspectTermExtraction as ATEPC
my_dataset = DatasetItem("my_dataset", ["/app/path/CustomDatasetArabic/custom.train.txt.atepc", "/app/path/100.CustomDatasetArabic/custom.test.txt.atepc"])
config = (ATEPC.ATEPCConfigManager.get_atepc_config_multilingual())
config.model = ATEPC.ATEPCModelList.FAST_LCF_ATEPC
config.evaluate_begin = 4
config.max_seq_len = 500
config.num_epoch = 5
config.batch_size = 16
config.patience = 2
config.log_step = -1
config.seed = [1]
config.show_metric = True
config.verbose = False # If verbose == True, PyABSA will output the model strcture and seversal processed data examples
config.notice = (
"This is a finetuned aspect term extraction model, based on ATEPC_MULTILINGUAL_CHECKPOINT, using Arabic data HAAD." # for memos usage
)
# # config.pretrained_bert = "yangheng/deberta-v3-base-absa-v1.1"
# # config.pretrained_bert = "yangheng/deberta-v3-large-absa-v1.1"
# # config.pretrained_bert = "MoritzLaurer/mDeBERTa-v3-base-xnli-multilingual-nli-2mil7"
# # config.pretrained_bert = "microsoft/mdeberta-v3-base"
# # config.pretrained_bert = "bert-base-multilingual-uncased"
trainer = ATEPC.ATEPCTrainer(
config=config,
dataset=my_dataset,
from_checkpoint="multilingual", # if you want to resume training from our pretrained checkpoints, you can pass the checkpoint name here
auto_device=DeviceTypeOption.AUTO, # use cuda if available
checkpoint_save_mode=ModelSaveOption.SAVE_MODEL_STATE_DICT, # save state dict only instead of the whole model
load_aug=False, # there are some augmentation dataset for integrated datasets, you use them by setting load_aug=True to improve performance
path_to_save="/app/path/NEW_ATEPC_MULTILINGUAL_CHECKPOINT"
)
Expected behavior I was expecting to see the model being trained and then saved. What should I do?
Screenshots
---------------------------------------------------------------------------
RuntimeError: Error(s) in loading state_dict for FAST_LCF_ATEPC:
Missing key(s) in state_dict: "bert4global.embeddings.position_embeddings.weight", "bert4global.embeddings.token_type_embeddings.weight", "bert4global.encoder.layer.0.attention.self.query.weight", "bert4global.encoder.layer.0.attention.self.query.bias", "bert4global.encoder.layer.0.attention.self.key.weight", "bert4global.encoder.layer.0.attention.self.key.bias", "bert4global.encoder.layer.0.attention.self.value.weight", "bert4global.encoder.layer.0.attention.self.value.bias", "bert4global.encoder.layer.1.attention.self.query.weight", "bert4global.encoder.layer.1.attention.self.query.bias", "bert4global.encoder.layer.1.attention.self.key.weight", "bert4global.encoder.layer.1.attention.self.key.bias", "bert4global.encoder.layer.1.attention.self.value.weight", "bert4global.encoder.layer.1.attention.self.value.bias", "bert4global.encoder.layer.2.attention.self.query.weight", "bert4global.encoder.layer.2.attention.self.query.bias", "bert4global.encoder.layer.2.attention.self.key.weight", "bert4global.encoder.layer.2.attention.self.key.bias", "bert4global.encoder.layer.2.attention.self.value.weight", "bert4global.encoder.layer.2.attention.self.value.bias", "bert4global.encoder.layer.3.attention.self.query.weight", "bert4global.encoder.layer.3.attention.self.query.bias", "bert4global.encoder.layer.3.attention.self.key.weight", "bert4global.encoder.layer.3.attention.self.key.bias", "bert4global.encoder.layer.3.attention.self.value.weight", "bert4global.encoder.layer.3.attention.self.value.bias", "bert4global.encoder.layer.4.attention.self.query.weight", "bert4global.encoder.layer.4.attention.self.query.bias", "bert4global.encoder.layer.4.attention.self.key.weight", "bert4global.encoder.layer.4.attention.self.key.bias", "bert4global.encoder.layer.4.attention.self.value.weight", "bert4global.encoder.layer.4.attention.self.value.bias", "bert4global.encoder.layer.5.attention.self.query.weight", "bert4global.encoder.layer.5.attention.self.query.bias", "bert4global.encoder.layer.5.attention.self.key.weight", "bert4global.encoder.layer.5.attention.self.key.bias", "bert4global.encoder.layer.5.attention.self.value.weight", "bert4global.encoder.layer.5.attention.self.value.bias", "bert4global.encoder.layer.6.attention.self.query.weight", "bert4global.encoder.layer.6.attention.self.query.bias", "bert4global.encoder.layer.6.attention.self.key.weight", "bert4global.encoder.layer.6.attention.self.key.bias", "bert4global.encoder.layer.6.attention.self.value.weight", "bert4global.encoder.layer.6.attention.self.value.bias", "bert4global.encoder.layer.7.attention.self.query.weight", "bert4global.encoder.layer.7.attention.self.query.bias", "bert4global.encoder.layer.7.attention.self.key.weight", "bert4global.encoder.layer.7.attention.self.key.bias", "bert4global.encoder.layer.7.attention.self.value.weight", "bert4global.encoder.layer.7.attention.self.value.bias", "bert4global.encoder.layer.8.attention.self.query.weight", "bert4global.encoder.layer.8.attention.self.query.bias", "bert4global.encoder.layer.8.attention.self.key.weight", "bert4global.encoder.layer.8.attention.self.key.bias", "bert4global.encoder.layer.8.attention.self.value.weight", "bert4global.encoder.layer.8.attention.self.value.bias", "bert4global.encoder.layer.9.attention.self.query.weight", "bert4global.encoder.layer.9.attention.self.query.bias", "bert4global.encoder.layer.9.attention.self.key.weight", "bert4global.encoder.layer.9.attention.self.key.bias", "bert4global.encoder.layer.9.attention.self.value.weight", "bert4global.encoder.layer.9.attention.self.value.bias", "bert4global.encoder.layer.10.attention.self.query.weight", "bert4global.encoder.layer.10.attention.self.query.bias", "bert4global.encoder.layer.10.attention.self.key.weight", "bert4global.encoder.layer.10.attention.self.key.bias", "bert4global.encoder.layer.10.attention.self.value.weight", "bert4global.encoder.layer.10.attention.self.value.bias", "bert4global.encoder.layer.11.attention.self.query.weight", "bert4global.encoder.layer.11.attention.self.query.bias", "bert4global.encoder.layer.11.attention.self.key.weight", "bert4global.encoder.layer.11.attention.self.key.bias", "bert4global.encoder.layer.11.attention.self.value.weight", "bert4global.encoder.layer.11.attention.self.value.bias", "bert4global.pooler.dense.weight", "bert4global.pooler.dense.bias".
Unexpected key(s) in state_dict: "bert4global.encoder.rel_embeddings.weight", "bert4global.encoder.LayerNorm.weight", "bert4global.encoder.LayerNorm.bias", "bert4global.encoder.layer.0.attention.self.query_proj.weight", "bert4global.encoder.layer.0.attention.self.query_proj.bias", "bert4global.encoder.layer.0.attention.self.key_proj.weight", "bert4global.encoder.layer.0.attention.self.key_proj.bias", "bert4global.encoder.layer.0.attention.self.value_proj.weight", "bert4global.encoder.layer.0.attention.self.value_proj.bias", "bert4global.encoder.layer.1.attention.self.query_proj.weight", "bert4global.encoder.layer.1.attention.self.query_proj.bias", "bert4global.encoder.layer.1.attention.self.key_proj.weight", "bert4global.encoder.layer.1.attention.self.key_proj.bias", "bert4global.encoder.layer.1.attention.self.value_proj.weight", "bert4global.encoder.layer.1.attention.self.value_proj.bias", "bert4global.encoder.layer.2.attention.self.query_proj.weight", "bert4global.encoder.layer.2.attention.self.query_proj.bias", "bert4global.encoder.layer.2.attention.self.key_proj.weight", "bert4global.encoder.layer.2.attention.self.key_proj.bias", "bert4global.encoder.layer.2.attention.self.value_proj.weight", "bert4global.encoder.layer.2.attention.self.value_proj.bias", "bert4global.encoder.layer.3.attention.self.query_proj.weight", "bert4global.encoder.layer.3.attention.self.query_proj.bias", "bert4global.encoder.layer.3.attention.self.key_proj.weight", "bert4global.encoder.layer.3.attention.self.key_proj.bias", "bert4global.encoder.layer.3.attention.self.value_proj.weight", "bert4global.encoder.layer.3.attention.self.value_proj.bias", "bert4global.encoder.layer.4.attention.self.query_proj.weight", "bert4global.encoder.layer.4.attention.self.query_proj.bias", "bert4global.encoder.layer.4.attention.self.key_proj.weight", "bert4global.encoder.layer.4.attention.self.key_proj.bias", "bert4global.encoder.layer.4.attention.self.value_proj.weight", "bert4global.encoder.layer.4.attention.self.value_proj.bias", "bert4global.encoder.layer.5.attention.self.query_proj.weight", "bert4global.encoder.layer.5.attention.self.query_proj.bias", "bert4global.encoder.layer.5.attention.self.key_proj.weight", "bert4global.encoder.layer.5.attention.self.key_proj.bias", "bert4global.encoder.layer.5.attention.self.value_proj.weight", "bert4global.encoder.layer.5.attention.self.value_proj.bias", "bert4global.encoder.layer.6.attention.self.query_proj.weight", "bert4global.encoder.layer.6.attention.self.query_proj.bias", "bert4global.encoder.layer.6.attention.self.key_proj.weight", "bert4global.encoder.layer.6.attention.self.key_proj.bias", "bert4global.encoder.layer.6.attention.self.value_proj.weight", "bert4global.encoder.layer.6.attention.self.value_proj.bias", "bert4global.encoder.layer.7.attention.self.query_proj.weight", "bert4global.encoder.layer.7.attention.self.query_proj.bias", "bert4global.encoder.layer.7.attention.self.key_proj.weight", "bert4global.encoder.layer.7.attention.self.key_proj.bias", "bert4global.encoder.layer.7.attention.self.value_proj.weight", "bert4global.encoder.layer.7.attention.self.value_proj.bias", "bert4global.encoder.layer.8.attention.self.query_proj.weight", "bert4global.encoder.layer.8.attention.self.query_proj.bias", "bert4global.encoder.layer.8.attention.self.key_proj.weight", "bert4global.encoder.layer.8.attention.self.key_proj.bias", "bert4global.encoder.layer.8.attention.self.value_proj.weight", "bert4global.encoder.layer.8.attention.self.value_proj.bias", "bert4global.encoder.layer.9.attention.self.query_proj.weight", "bert4global.encoder.layer.9.attention.self.query_proj.bias", "bert4global.encoder.layer.9.attention.self.key_proj.weight", "bert4global.encoder.layer.9.attention.self.key_proj.bias", "bert4global.encoder.layer.9.attention.self.value_proj.weight", "bert4global.encoder.layer.9.attention.self.value_proj.bias", "bert4global.encoder.layer.10.attention.self.query_proj.weight", "bert4global.encoder.layer.10.attention.self.query_proj.bias", "bert4global.encoder.layer.10.attention.self.key_proj.weight", "bert4global.encoder.layer.10.attention.self.key_proj.bias", "bert4global.encoder.layer.10.attention.self.value_proj.weight", "bert4global.encoder.layer.10.attention.self.value_proj.bias", "bert4global.encoder.layer.11.attention.self.query_proj.weight", "bert4global.encoder.layer.11.attention.self.query_proj.bias", "bert4global.encoder.layer.11.attention.self.key_proj.weight", "bert4global.encoder.layer.11.attention.self.key_proj.bias", "bert4global.encoder.layer.11.attention.self.value_proj.weight", "bert4global.encoder.layer.11.attention.self.value_proj.bias".
size mismatch for bert4global.embeddings.word_embeddings.weight: copying a param with shape torch.Size([251000, 768]) from checkpoint, the shape in current model is torch.Size([105879, 768]).
size mismatch for dense.weight: copying a param with shape torch.Size([3, 768]) from checkpoint, the shape in current model is torch.Size([4, 768]).
size mismatch for dense.bias: copying a param with shape torch.Size([3]) from checkpoint, the shape in current model is torch.Size([4]).
please pip install pyabsa -U and see if it is repaired
I have reinstalled as you said and the result has not changed. I still get error.
RuntimeError: Error(s) in loading state_dict for FAST_LCF_ATEPC:
size mismatch for bert4global.embeddings.word_embeddings.weight: copying a param with shape torch.Size([251000, 768]) from checkpoint, the shape in current model is torch.Size([105879, 768]).
size mismatch for dense.weight: copying a param with shape torch.Size([3, 768]) from checkpoint, the shape in current model is torch.Size([4, 768]).
size mismatch for dense.bias: copying a param with shape torch.Size([3]) from checkpoint, the shape in current model is torch.Size([4]).
Currently config.model
is FAST_LCF_ATEPC. Should I change the config.model to something else like FAST_LCFS_ATEPC or LCFS_ATEPC_LARGE, etc.?
Also, you wrote in the documentation here:
There are three types of APC models for aspect term extraction, which are based on the local context focus mechanism Notice: when you select to use a model, please make sure to carefully manage the configurations, e.g., for glove-based models, you need to set hidden dim and embed_dim manually. We already provide some pre-defined configurations.
Should I change "hidden dim and embed_dim manually" if it will solve the problem, and if so how can I do that?
_atepc_config_multilingual = {
"model": LCF_ATEPC,
"optimizer": "adamw",
"learning_rate": 0.00002,
"pretrained_bert": "bert-base-multilingual-uncased",
"use_bert_spc": True,
"cache_dataset": True,
"warmup_step": -1,
"show_metric": False,
"max_seq_len": 80,
"SRD": 3,
"use_syntax_based_SRD": False,
"lcf": "cdw",
"window": "lr",
"dropout": 0.5,
"l2reg": 0.00001,
"num_epoch": 10,
"batch_size": 16,
"initializer": "xavier_uniform_",
"seed": 52,
"output_dim": 2,
"log_step": 50,
"patience": 99999,
"gradient_accumulation_steps": 1,
"dynamic_truncate": True,
"srd_alignment": True, # for srd_alignment
"evaluate_begin": 0,
}
Note:
The code works if I train a new model from scratch (no checkpoint used, and more time and data necessary), so there must be some mismatch between the multilingual checkpoint model and the config.pretrained_bert
and/or config.model
options.
This is a known issue caused by transformers breaking change, which version of pyabsa do you use?
When training ASTEPC model with both my custom and predefined datasets but giving below error
I followed the following notebook: https://github.com/yangheng95/PyABSA/blob/v2/examples-v2/aspect_term_extraction/Aspect_Term_Extraction.ipynbhttps://github.com/yangheng95/PyABSA/blob/v2/examples-v2/aspect_term_extraction/Aspect_Term_Extraction.ipynb
RuntimeError: Error(s) in loading state_dict for FAST_LCF_ATEPC: size mismatch for bert4global.embeddings.word_embeddings.weight: copying a param with shape torch.Size([251000, 768]) from checkpoint, the shape in current model is torch.Size([128100, 768]).