While training my custom data for ASTEPC got weight mismatch error

Ibrokhimsadikov commented 1 year ago

When training ASTEPC model with both my custom and predefined datasets but giving below error

I followed the following notebook: https://github.com/yangheng95/PyABSA/blob/v2/examples-v2/aspect_term_extraction/Aspect_Term_Extraction.ipynbhttps://github.com/yangheng95/PyABSA/blob/v2/examples-v2/aspect_term_extraction/Aspect_Term_Extraction.ipynb

RuntimeError: Error(s) in loading state_dict for FAST_LCF_ATEPC: size mismatch for bert4global.embeddings.word_embeddings.weight: copying a param with shape torch.Size([251000, 768]) from checkpoint, the shape in current model is torch.Size([128100, 768]).

yangheng95 commented 1 year ago

Please provide useful information according to the report: https://github.com/yangheng95/PyABSA/issues/new?assignees=&labels=&template=bug_report.md&title=

KadriMufti commented 7 months ago

Version I installed pyabsa version 2.4.1 and torch version 1.13.1 and transformers version 4.27.2

Describe the bug Hello, I have the same issue. I am trying to finetune your latest multlingual model on my own Arabic dataset starting from the multilingual checkpoint. I am sure the problem is not the dataset. I will paste the error log below. I get an error when I use any of the following options for config.pretrained_bert. I also get an error (see below) when I do not set config.pretrained_bert to any value. There is always an error about state_dict or something:

"yangheng/deberta-v3-base-absa-v1.1"
"yangheng/deberta-v3-large-absa-v1.1"
"MoritzLaurer/mDeBERTa-v3-base-xnli-multilingual-nli-2mil7"
"microsoft/mdeberta-v3-base"
"bert-base-multilingual-uncased" (this is the default I think)

Sample data:

بصراحة O -100
أنا O -100
ما O -100
أحب O -100
الكاتب O -100
اللي O -100
يدخل O -100
اللغة B-ASP negative
العامية I-ASP negative
في O -100
كتاباته O -100
مع O -100
اني O -100
أمارس O -100
هذا O -100
الخطأ O -100

روايه B-ASP negative
حزينه O -100
قد O -100
لاتستحق O -100
عناء O -100
القراءه O -100

Code To Reproduce

import warnings
warnings.filterwarnings("ignore")
import json
import os
# os.environ["CUDA_VISIBLE_DEVICES"] = "0,1,2,3,4"
from pyabsa import ModelSaveOption, DeviceTypeOption
import findfile
from pyabsa import AspectTermExtraction as ATEPC

my_dataset = DatasetItem("my_dataset", ["/app/path/CustomDatasetArabic/custom.train.txt.atepc",                                "/app/path/100.CustomDatasetArabic/custom.test.txt.atepc"])

config = (ATEPC.ATEPCConfigManager.get_atepc_config_multilingual())
config.model = ATEPC.ATEPCModelList.FAST_LCF_ATEPC
config.evaluate_begin = 4
config.max_seq_len = 500
config.num_epoch = 5
config.batch_size = 16
config.patience = 2
config.log_step = -1
config.seed = [1]
config.show_metric = True
config.verbose = False  # If verbose == True, PyABSA will output the model strcture and seversal processed data examples
config.notice = (
    "This is a finetuned aspect term extraction model, based on ATEPC_MULTILINGUAL_CHECKPOINT, using Arabic data HAAD."  # for memos usage
)
# # config.pretrained_bert = "yangheng/deberta-v3-base-absa-v1.1" 
# # config.pretrained_bert = "yangheng/deberta-v3-large-absa-v1.1" 
# # config.pretrained_bert = "MoritzLaurer/mDeBERTa-v3-base-xnli-multilingual-nli-2mil7" 
# # config.pretrained_bert = "microsoft/mdeberta-v3-base" 
# # config.pretrained_bert = "bert-base-multilingual-uncased"

trainer = ATEPC.ATEPCTrainer(
    config=config,
    dataset=my_dataset,
    from_checkpoint="multilingual",  # if you want to resume training from our pretrained checkpoints, you can pass the checkpoint name here
    auto_device=DeviceTypeOption.AUTO,  # use cuda if available
    checkpoint_save_mode=ModelSaveOption.SAVE_MODEL_STATE_DICT,  # save state dict only instead of the whole model
    load_aug=False,  # there are some augmentation dataset for integrated datasets, you use them by setting load_aug=True to improve performance
    path_to_save="/app/path/NEW_ATEPC_MULTILINGUAL_CHECKPOINT"
)

Expected behavior I was expecting to see the model being trained and then saved. What should I do?

Screenshots

---------------------------------------------------------------------------
RuntimeError: Error(s) in loading state_dict for FAST_LCF_ATEPC:
    Missing key(s) in state_dict: "bert4global.embeddings.position_embeddings.weight", "bert4global.embeddings.token_type_embeddings.weight", "bert4global.encoder.layer.0.attention.self.query.weight", "bert4global.encoder.layer.0.attention.self.query.bias", "bert4global.encoder.layer.0.attention.self.key.weight", "bert4global.encoder.layer.0.attention.self.key.bias", "bert4global.encoder.layer.0.attention.self.value.weight", "bert4global.encoder.layer.0.attention.self.value.bias", "bert4global.encoder.layer.1.attention.self.query.weight", "bert4global.encoder.layer.1.attention.self.query.bias", "bert4global.encoder.layer.1.attention.self.key.weight", "bert4global.encoder.layer.1.attention.self.key.bias", "bert4global.encoder.layer.1.attention.self.value.weight", "bert4global.encoder.layer.1.attention.self.value.bias", "bert4global.encoder.layer.2.attention.self.query.weight", "bert4global.encoder.layer.2.attention.self.query.bias", "bert4global.encoder.layer.2.attention.self.key.weight", "bert4global.encoder.layer.2.attention.self.key.bias", "bert4global.encoder.layer.2.attention.self.value.weight", "bert4global.encoder.layer.2.attention.self.value.bias", "bert4global.encoder.layer.3.attention.self.query.weight", "bert4global.encoder.layer.3.attention.self.query.bias", "bert4global.encoder.layer.3.attention.self.key.weight", "bert4global.encoder.layer.3.attention.self.key.bias", "bert4global.encoder.layer.3.attention.self.value.weight", "bert4global.encoder.layer.3.attention.self.value.bias", "bert4global.encoder.layer.4.attention.self.query.weight", "bert4global.encoder.layer.4.attention.self.query.bias", "bert4global.encoder.layer.4.attention.self.key.weight", "bert4global.encoder.layer.4.attention.self.key.bias", "bert4global.encoder.layer.4.attention.self.value.weight", "bert4global.encoder.layer.4.attention.self.value.bias", "bert4global.encoder.layer.5.attention.self.query.weight", "bert4global.encoder.layer.5.attention.self.query.bias", "bert4global.encoder.layer.5.attention.self.key.weight", "bert4global.encoder.layer.5.attention.self.key.bias", "bert4global.encoder.layer.5.attention.self.value.weight", "bert4global.encoder.layer.5.attention.self.value.bias", "bert4global.encoder.layer.6.attention.self.query.weight", "bert4global.encoder.layer.6.attention.self.query.bias", "bert4global.encoder.layer.6.attention.self.key.weight", "bert4global.encoder.layer.6.attention.self.key.bias", "bert4global.encoder.layer.6.attention.self.value.weight", "bert4global.encoder.layer.6.attention.self.value.bias", "bert4global.encoder.layer.7.attention.self.query.weight", "bert4global.encoder.layer.7.attention.self.query.bias", "bert4global.encoder.layer.7.attention.self.key.weight", "bert4global.encoder.layer.7.attention.self.key.bias", "bert4global.encoder.layer.7.attention.self.value.weight", "bert4global.encoder.layer.7.attention.self.value.bias", "bert4global.encoder.layer.8.attention.self.query.weight", "bert4global.encoder.layer.8.attention.self.query.bias", "bert4global.encoder.layer.8.attention.self.key.weight", "bert4global.encoder.layer.8.attention.self.key.bias", "bert4global.encoder.layer.8.attention.self.value.weight", "bert4global.encoder.layer.8.attention.self.value.bias", "bert4global.encoder.layer.9.attention.self.query.weight", "bert4global.encoder.layer.9.attention.self.query.bias", "bert4global.encoder.layer.9.attention.self.key.weight", "bert4global.encoder.layer.9.attention.self.key.bias", "bert4global.encoder.layer.9.attention.self.value.weight", "bert4global.encoder.layer.9.attention.self.value.bias", "bert4global.encoder.layer.10.attention.self.query.weight", "bert4global.encoder.layer.10.attention.self.query.bias", "bert4global.encoder.layer.10.attention.self.key.weight", "bert4global.encoder.layer.10.attention.self.key.bias", "bert4global.encoder.layer.10.attention.self.value.weight", "bert4global.encoder.layer.10.attention.self.value.bias", "bert4global.encoder.layer.11.attention.self.query.weight", "bert4global.encoder.layer.11.attention.self.query.bias", "bert4global.encoder.layer.11.attention.self.key.weight", "bert4global.encoder.layer.11.attention.self.key.bias", "bert4global.encoder.layer.11.attention.self.value.weight", "bert4global.encoder.layer.11.attention.self.value.bias", "bert4global.pooler.dense.weight", "bert4global.pooler.dense.bias". 
    Unexpected key(s) in state_dict: "bert4global.encoder.rel_embeddings.weight", "bert4global.encoder.LayerNorm.weight", "bert4global.encoder.LayerNorm.bias", "bert4global.encoder.layer.0.attention.self.query_proj.weight", "bert4global.encoder.layer.0.attention.self.query_proj.bias", "bert4global.encoder.layer.0.attention.self.key_proj.weight", "bert4global.encoder.layer.0.attention.self.key_proj.bias", "bert4global.encoder.layer.0.attention.self.value_proj.weight", "bert4global.encoder.layer.0.attention.self.value_proj.bias", "bert4global.encoder.layer.1.attention.self.query_proj.weight", "bert4global.encoder.layer.1.attention.self.query_proj.bias", "bert4global.encoder.layer.1.attention.self.key_proj.weight", "bert4global.encoder.layer.1.attention.self.key_proj.bias", "bert4global.encoder.layer.1.attention.self.value_proj.weight", "bert4global.encoder.layer.1.attention.self.value_proj.bias", "bert4global.encoder.layer.2.attention.self.query_proj.weight", "bert4global.encoder.layer.2.attention.self.query_proj.bias", "bert4global.encoder.layer.2.attention.self.key_proj.weight", "bert4global.encoder.layer.2.attention.self.key_proj.bias", "bert4global.encoder.layer.2.attention.self.value_proj.weight", "bert4global.encoder.layer.2.attention.self.value_proj.bias", "bert4global.encoder.layer.3.attention.self.query_proj.weight", "bert4global.encoder.layer.3.attention.self.query_proj.bias", "bert4global.encoder.layer.3.attention.self.key_proj.weight", "bert4global.encoder.layer.3.attention.self.key_proj.bias", "bert4global.encoder.layer.3.attention.self.value_proj.weight", "bert4global.encoder.layer.3.attention.self.value_proj.bias", "bert4global.encoder.layer.4.attention.self.query_proj.weight", "bert4global.encoder.layer.4.attention.self.query_proj.bias", "bert4global.encoder.layer.4.attention.self.key_proj.weight", "bert4global.encoder.layer.4.attention.self.key_proj.bias", "bert4global.encoder.layer.4.attention.self.value_proj.weight", "bert4global.encoder.layer.4.attention.self.value_proj.bias", "bert4global.encoder.layer.5.attention.self.query_proj.weight", "bert4global.encoder.layer.5.attention.self.query_proj.bias", "bert4global.encoder.layer.5.attention.self.key_proj.weight", "bert4global.encoder.layer.5.attention.self.key_proj.bias", "bert4global.encoder.layer.5.attention.self.value_proj.weight", "bert4global.encoder.layer.5.attention.self.value_proj.bias", "bert4global.encoder.layer.6.attention.self.query_proj.weight", "bert4global.encoder.layer.6.attention.self.query_proj.bias", "bert4global.encoder.layer.6.attention.self.key_proj.weight", "bert4global.encoder.layer.6.attention.self.key_proj.bias", "bert4global.encoder.layer.6.attention.self.value_proj.weight", "bert4global.encoder.layer.6.attention.self.value_proj.bias", "bert4global.encoder.layer.7.attention.self.query_proj.weight", "bert4global.encoder.layer.7.attention.self.query_proj.bias", "bert4global.encoder.layer.7.attention.self.key_proj.weight", "bert4global.encoder.layer.7.attention.self.key_proj.bias", "bert4global.encoder.layer.7.attention.self.value_proj.weight", "bert4global.encoder.layer.7.attention.self.value_proj.bias", "bert4global.encoder.layer.8.attention.self.query_proj.weight", "bert4global.encoder.layer.8.attention.self.query_proj.bias", "bert4global.encoder.layer.8.attention.self.key_proj.weight", "bert4global.encoder.layer.8.attention.self.key_proj.bias", "bert4global.encoder.layer.8.attention.self.value_proj.weight", "bert4global.encoder.layer.8.attention.self.value_proj.bias", "bert4global.encoder.layer.9.attention.self.query_proj.weight", "bert4global.encoder.layer.9.attention.self.query_proj.bias", "bert4global.encoder.layer.9.attention.self.key_proj.weight", "bert4global.encoder.layer.9.attention.self.key_proj.bias", "bert4global.encoder.layer.9.attention.self.value_proj.weight", "bert4global.encoder.layer.9.attention.self.value_proj.bias", "bert4global.encoder.layer.10.attention.self.query_proj.weight", "bert4global.encoder.layer.10.attention.self.query_proj.bias", "bert4global.encoder.layer.10.attention.self.key_proj.weight", "bert4global.encoder.layer.10.attention.self.key_proj.bias", "bert4global.encoder.layer.10.attention.self.value_proj.weight", "bert4global.encoder.layer.10.attention.self.value_proj.bias", "bert4global.encoder.layer.11.attention.self.query_proj.weight", "bert4global.encoder.layer.11.attention.self.query_proj.bias", "bert4global.encoder.layer.11.attention.self.key_proj.weight", "bert4global.encoder.layer.11.attention.self.key_proj.bias", "bert4global.encoder.layer.11.attention.self.value_proj.weight", "bert4global.encoder.layer.11.attention.self.value_proj.bias". 
    size mismatch for bert4global.embeddings.word_embeddings.weight: copying a param with shape torch.Size([251000, 768]) from checkpoint, the shape in current model is torch.Size([105879, 768]).
    size mismatch for dense.weight: copying a param with shape torch.Size([3, 768]) from checkpoint, the shape in current model is torch.Size([4, 768]).
    size mismatch for dense.bias: copying a param with shape torch.Size([3]) from checkpoint, the shape in current model is torch.Size([4]).

yangheng95 commented 7 months ago

please pip install pyabsa -U and see if it is repaired

KadriMufti commented 7 months ago

I have reinstalled as you said and the result has not changed. I still get error.

RuntimeError: Error(s) in loading state_dict for FAST_LCF_ATEPC:
    size mismatch for bert4global.embeddings.word_embeddings.weight: copying a param with shape torch.Size([251000, 768]) from checkpoint, the shape in current model is torch.Size([105879, 768]).
    size mismatch for dense.weight: copying a param with shape torch.Size([3, 768]) from checkpoint, the shape in current model is torch.Size([4, 768]).
    size mismatch for dense.bias: copying a param with shape torch.Size([3]) from checkpoint, the shape in current model is torch.Size([4]).

Currently config.model is FAST_LCF_ATEPC. Should I change the config.model to something else like FAST_LCFS_ATEPC or LCFS_ATEPC_LARGE, etc.?

Also, you wrote in the documentation here:

There are three types of APC models for aspect term extraction, which are based on the local context focus mechanism Notice: when you select to use a model, please make sure to carefully manage the configurations, e.g., for glove-based models, you need to set hidden dim and embed_dim manually. We already provide some pre-defined configurations.

Should I change "hidden dim and embed_dim manually" if it will solve the problem, and if so how can I do that?

_atepc_config_multilingual = {
    "model": LCF_ATEPC,
    "optimizer": "adamw",
    "learning_rate": 0.00002,
    "pretrained_bert": "bert-base-multilingual-uncased",
    "use_bert_spc": True,
    "cache_dataset": True,
    "warmup_step": -1,
    "show_metric": False,
    "max_seq_len": 80,
    "SRD": 3,
    "use_syntax_based_SRD": False,
    "lcf": "cdw",
    "window": "lr",
    "dropout": 0.5,
    "l2reg": 0.00001,
    "num_epoch": 10,
    "batch_size": 16,
    "initializer": "xavier_uniform_",
    "seed": 52,
    "output_dim": 2,
    "log_step": 50,
    "patience": 99999,
    "gradient_accumulation_steps": 1,
    "dynamic_truncate": True,
    "srd_alignment": True,  # for srd_alignment
    "evaluate_begin": 0,
}

Note: The code works if I train a new model from scratch (no checkpoint used, and more time and data necessary), so there must be some mismatch between the multilingual checkpoint model and the config.pretrained_bert and/or config.model options.

yangheng95 commented 7 months ago

This is a known issue caused by transformers breaking change, which version of pyabsa do you use?

yangheng95 / PyABSA

While training my custom data for ASTEPC got weight mismatch error #310