Train model from checkpoint with custom data throwing "Target -100 is out of bounds" error

amandalim857 commented 1 year ago

Hi Yang Heng, I am trying to train the model from the checkpoint with custom data. I have already labelled the data in the correct format and also used convert_apc_set_to_atepc_set with no problem at all. I renamed the dataset to "hotel.train.txt.atepc" etc and put the data into integrated_datasets/atepc/150.Hotel/ and ran the model. However, I keep getting this error:

I noticed in other datasets it was labelled -999 while mine was -100, so I changed the labels to -999 but still have the same "Target -100 is out of bounds" error.

This is how my current labelled data looks like:

I am also not sure why errors would appear in my annotations, since if I ran convert_apc_set_to_atepc_set wouldn't that shown me errors if there was problems with my data in the first place, instead of being able to successfully create the atepc annotated file?

Thank you very much for your kind help!

yangheng95 commented 1 year ago

In this context, I cannot find where is wrong, please see the format of bug report and provide all required information so that I can try to fix it.

yangheng95 commented 1 year ago

FYI:

amandalim857 commented 1 year ago

This is a part-time project and I am very busy in my own work, so if you cannot provide as REQUIRED information as you can, I may have no time to solve your problem.

PyABSA Version (Required)

PyABSA version: 2.2.2, Transformers version: 4.28.0, Torch version: 2.0.0+cu117+cuda11.7

ABSADataset Version (Required if you use integrated datasets)

100.CustomDataset in the trainer is not a exact path, will search dataset in current working directory It is the 100.CustomDataset that I get when I run download_all_available_datasets().

Code To Reproduce (Required)**

from pyabsa import AspectTermExtraction as ATEPC, DeviceTypeOption, ModelSaveOption
from pyabsa import DatasetItem
# from pyabsa import download_all_available_datasets

# create my dataset
# download_all_available_datasets()
# hotel = DatasetItem("Hotel", "10.Hotel")

# Define the configuration
config = ATEPC.ATEPCConfigManager.get_atepc_config_english()
config.model = ATEPC.ATEPCModelList.FAST_LCF_ATEPC
config.evaluate_begin = 0
config.num_epoch = 1
config.log_step = -1

# Load the model
# dataset = ATEPC.ATEPCDatasetList.Restaurant14
# dataset = "10.Hotel"
dataset = "100.CustomDataset"

aspect_extractor = ATEPC.ATEPCTrainer(
    config=config,
    dataset=dataset,
    from_checkpoint="english",
    checkpoint_save_mode=ModelSaveOption.SAVE_MODEL_STATE_DICT,
    auto_device=DeviceTypeOption.AUTO,
    path_to_save="content",
    load_aug=False,
    ).load_trained_model()

Full Console Output (Required)**

2023-04-14 17:10:15,841 INFO: PyABSA version: 2.2.2
2023-04-14 17:10:15,841 INFO: Transformers version: 4.28.0
2023-04-14 17:10:15,841 INFO: Torch version: 2.0.0+cu117+cuda11.7
2023-04-14 17:10:15,842 INFO: Device: Unknown
2023-04-14 17:10:15,842 INFO: 100.CustomDataset in the trainer is not a exact path, will search dataset in current working directory
2023-04-14 17:10:15,872 INFO: You can set load_aug=True in a trainer to augment your dataset (English only yet) and improve performance.
2023-04-14 17:10:15,872 INFO: Please DO NOT mix datasets with different sentiment labels for trainer & inference !
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
/home/amanda/Desktop/temp_model/venv/lib/python3.10/site-packages/transformers/convert_slow_tokenizer.py:454: UserWarning: The sentencepiece tokenizer that you are converting to a fast tokenizer uses the byte fallback option which is not implemented in the fast tokenizers. In practice this means that the fast version of the tokenizer can produce unknown tokens whereas the sentencepiece version would have converted these unknown tokens into a sequence of byte tokens matching the original piece of text.
  warnings.warn(
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
2023-04-14 17:10:17,460 INFO: Load cache dataset from fast_lcf_atepc.custom_dataset.dataset.3501c458de1a30c032e70c298a36b807fca865a0e8ca652b2356f03a3e16b3c0.cache
Some weights of the model checkpoint at microsoft/deberta-v3-base were not used when initializing DebertaV2Model: ['lm_predictions.lm_head.dense.bias', 'mask_predictions.dense.bias', 'lm_predictions.lm_head.LayerNorm.bias', 'lm_predictions.lm_head.LayerNorm.weight', 'mask_predictions.LayerNorm.bias', 'lm_predictions.lm_head.dense.weight', 'mask_predictions.LayerNorm.weight', 'mask_predictions.dense.weight', 'lm_predictions.lm_head.bias', 'mask_predictions.classifier.bias', 'mask_predictions.classifier.weight']
- This IS expected if you are initializing DebertaV2Model from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DebertaV2Model from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
2023-04-14 17:10:19,486 INFO: ABSADatasetsVersion:None  --> Calling Count:0
2023-04-14 17:10:19,486 INFO: IOB_label_to_index:{'B-ASP': 1, 'I-ASP': 2, 'O': 3, '[CLS]': 4, '[SEP]': 5}   --> Calling Count:1
2023-04-14 17:10:19,486 INFO: MV:<metric_visualizer.metric_visualizer.MetricVisualizer object at 0x7f1629348220>    --> Calling Count:0
2023-04-14 17:10:19,486 INFO: PyABSAVersion:2.2.2   --> Calling Count:1
2023-04-14 17:10:19,486 INFO: SRD:3 --> Calling Count:220
2023-04-14 17:10:19,486 INFO: TorchVersion:2.0.0+cu117+cuda11.7 --> Calling Count:1
2023-04-14 17:10:19,486 INFO: TransformersVersion:4.28.0    --> Calling Count:1
2023-04-14 17:10:19,486 INFO: auto_device:True  --> Calling Count:3
2023-04-14 17:10:19,486 INFO: batch_size:16 --> Calling Count:7
2023-04-14 17:10:19,486 INFO: cache_dataset:True    --> Calling Count:1
2023-04-14 17:10:19,486 INFO: checkpoint_save_mode:1    --> Calling Count:3
2023-04-14 17:10:19,486 INFO: cross_validate_fold:-1    --> Calling Count:0
2023-04-14 17:10:19,486 INFO: dataset_file:{'train': ['integrated_datasets/atepc_datasets/100.CustomDataset/custom.train.txt.atepc'], 'test': ['integrated_datasets/atepc_datasets/100.CustomDataset/custom.test.txt.atepc'], 'valid': []}  --> Calling Count:4
2023-04-14 17:10:19,486 INFO: dataset_name:custom_dataset   --> Calling Count:3
2023-04-14 17:10:19,486 INFO: device:cpu    --> Calling Count:3
2023-04-14 17:10:19,486 INFO: device_name:Unknown   --> Calling Count:1
2023-04-14 17:10:19,486 INFO: dropout:0.5   --> Calling Count:1
2023-04-14 17:10:19,486 INFO: dynamic_truncate:True --> Calling Count:220
2023-04-14 17:10:19,486 INFO: embed_dim:768 --> Calling Count:0
2023-04-14 17:10:19,486 INFO: evaluate_begin:0  --> Calling Count:0
2023-04-14 17:10:19,486 INFO: from_checkpoint:english   --> Calling Count:0
2023-04-14 17:10:19,486 INFO: gradient_accumulation_steps:1 --> Calling Count:4
2023-04-14 17:10:19,486 INFO: hidden_dim:768    --> Calling Count:6
2023-04-14 17:10:19,486 INFO: index_to_IOB_label:{1: 'B-ASP', 2: 'I-ASP', 3: 'O', 4: '[CLS]', 5: '[SEP]'}   --> Calling Count:0
2023-04-14 17:10:19,487 INFO: index_to_label:{0: 'Negative', 1: 'Neutral', 2: 'Positive'}   --> Calling Count:3
2023-04-14 17:10:19,487 INFO: inference_model:None  --> Calling Count:0
2023-04-14 17:10:19,487 INFO: initializer:xavier_uniform_   --> Calling Count:0
2023-04-14 17:10:19,487 INFO: l2reg:1e-05   --> Calling Count:2
2023-04-14 17:10:19,487 INFO: label_list:['B-ASP', 'O', '[CLS]', '[SEP]']   --> Calling Count:1
2023-04-14 17:10:19,487 INFO: label_to_index:{'negative': 0, 'positive': 1, 'Negative': 0, 'Neutral': 1, 'Positive': 2} --> Calling Count:1
2023-04-14 17:10:19,487 INFO: lcf:cdw   --> Calling Count:0
2023-04-14 17:10:19,487 INFO: learning_rate:2e-05   --> Calling Count:1
2023-04-14 17:10:19,487 INFO: load_aug:False    --> Calling Count:1
2023-04-14 17:10:19,487 INFO: log_step:-1   --> Calling Count:0
2023-04-14 17:10:19,487 INFO: logger:<Logger fast_lcf_atepc (INFO)> --> Calling Count:9
2023-04-14 17:10:19,487 INFO: max_seq_len:80    --> Calling Count:772
2023-04-14 17:10:19,487 INFO: model:<class 'pyabsa.tasks.AspectTermExtraction.models.__lcf__.fast_lcf_atepc.FAST_LCF_ATEPC'>    --> Calling Count:5
2023-04-14 17:10:19,487 INFO: model_name:fast_lcf_atepc --> Calling Count:112
2023-04-14 17:10:19,487 INFO: model_path_to_save:content    --> Calling Count:2
2023-04-14 17:10:19,487 INFO: num_epoch:1   --> Calling Count:2
2023-04-14 17:10:19,487 INFO: num_labels:5  --> Calling Count:3
2023-04-14 17:10:19,487 INFO: optimizer:adamw   --> Calling Count:2
2023-04-14 17:10:19,487 INFO: output_dim:3  --> Calling Count:1
2023-04-14 17:10:19,487 INFO: overwrite_cache:False --> Calling Count:1
2023-04-14 17:10:19,487 INFO: path_to_save:content  --> Calling Count:2
2023-04-14 17:10:19,487 INFO: patience:99999    --> Calling Count:0
2023-04-14 17:10:19,487 INFO: pretrained_bert:microsoft/deberta-v3-base --> Calling Count:6
2023-04-14 17:10:19,487 INFO: save_mode:1   --> Calling Count:0
2023-04-14 17:10:19,487 INFO: seed:52   --> Calling Count:7
2023-04-14 17:10:19,487 INFO: sep_indices:2 --> Calling Count:0
2023-04-14 17:10:19,487 INFO: show_metric:False --> Calling Count:0
2023-04-14 17:10:19,487 INFO: spacy_model:en_core_web_sm    --> Calling Count:3
2023-04-14 17:10:19,487 INFO: srd_alignment:True    --> Calling Count:0
2023-04-14 17:10:19,487 INFO: task_code:ATEPC   --> Calling Count:1
2023-04-14 17:10:19,487 INFO: task_name:Aspect Term Extraction and Polarity Classification  --> Calling Count:0
2023-04-14 17:10:19,487 INFO: use_amp:False --> Calling Count:1
2023-04-14 17:10:19,487 INFO: use_bert_spc:True --> Calling Count:0
2023-04-14 17:10:19,487 INFO: use_syntax_based_SRD:False    --> Calling Count:110
2023-04-14 17:10:19,487 INFO: warmup_step:-1    --> Calling Count:0
2023-04-14 17:10:19,488 INFO: window:lr --> Calling Count:0
2023-04-14 17:10:19,490 INFO: Model Architecture:
 FAST_LCF_ATEPC(
  (bert4global): DebertaV2Model(
    (embeddings): DebertaV2Embeddings(
      (word_embeddings): Embedding(128100, 768, padding_idx=0)
      (LayerNorm): LayerNorm((768,), eps=1e-07, elementwise_affine=True)
      (dropout): StableDropout()
    )
    (encoder): DebertaV2Encoder(
      (layer): ModuleList(
        (0-11): 12 x DebertaV2Layer(
          (attention): DebertaV2Attention(
            (self): DisentangledSelfAttention(
              (query_proj): Linear(in_features=768, out_features=768, bias=True)
              (key_proj): Linear(in_features=768, out_features=768, bias=True)
              (value_proj): Linear(in_features=768, out_features=768, bias=True)
              (pos_dropout): StableDropout()
              (dropout): StableDropout()
            )
            (output): DebertaV2SelfOutput(
              (dense): Linear(in_features=768, out_features=768, bias=True)
              (LayerNorm): LayerNorm((768,), eps=1e-07, elementwise_affine=True)
              (dropout): StableDropout()
            )
          )
          (intermediate): DebertaV2Intermediate(
            (dense): Linear(in_features=768, out_features=3072, bias=True)
            (intermediate_act_fn): GELUActivation()
          )
          (output): DebertaV2Output(
            (dense): Linear(in_features=3072, out_features=768, bias=True)
            (LayerNorm): LayerNorm((768,), eps=1e-07, elementwise_affine=True)
            (dropout): StableDropout()
          )
        )
      )
      (rel_embeddings): Embedding(512, 768)
      (LayerNorm): LayerNorm((768,), eps=1e-07, elementwise_affine=True)
    )
  )
  (dropout): Dropout(p=0.5, inplace=False)
  (SA1): Encoder(
    (encoder): ModuleList(
      (0): SelfAttention(
        (SA): BertSelfAttention(
          (query): Linear(in_features=768, out_features=768, bias=True)
          (key): Linear(in_features=768, out_features=768, bias=True)
          (value): Linear(in_features=768, out_features=768, bias=True)
          (dropout): Dropout(p=0.1, inplace=False)
        )
      )
    )
    (tanh): Tanh()
  )
  (SA2): Encoder(
    (encoder): ModuleList(
      (0): SelfAttention(
        (SA): BertSelfAttention(
          (query): Linear(in_features=768, out_features=768, bias=True)
          (key): Linear(in_features=768, out_features=768, bias=True)
          (value): Linear(in_features=768, out_features=768, bias=True)
          (dropout): Dropout(p=0.1, inplace=False)
        )
      )
    )
    (tanh): Tanh()
  )
  (linear_double): Linear(in_features=1536, out_features=768, bias=True)
  (linear_triple): Linear(in_features=2304, out_features=768, bias=True)
  (pooler): BertPooler(
    (dense): Linear(in_features=768, out_features=768, bias=True)
    (activation): Tanh()
  )
  (dense): Linear(in_features=768, out_features=3, bias=True)
  (classifier): Linear(in_features=768, out_features=5, bias=True)
)
2023-04-14 17:10:19,490 INFO: ABSADatasetsVersion:None  --> Calling Count:0
2023-04-14 17:10:19,490 INFO: IOB_label_to_index:{'B-ASP': 1, 'I-ASP': 2, 'O': 3, '[CLS]': 4, '[SEP]': 5}   --> Calling Count:1
2023-04-14 17:10:19,490 INFO: MV:<metric_visualizer.metric_visualizer.MetricVisualizer object at 0x7f1629348220>    --> Calling Count:0
2023-04-14 17:10:19,490 INFO: PyABSAVersion:2.2.2   --> Calling Count:1
2023-04-14 17:10:19,490 INFO: SRD:3 --> Calling Count:220
2023-04-14 17:10:19,490 INFO: TorchVersion:2.0.0+cu117+cuda11.7 --> Calling Count:1
2023-04-14 17:10:19,490 INFO: TransformersVersion:4.28.0    --> Calling Count:1
2023-04-14 17:10:19,490 INFO: auto_device:True  --> Calling Count:4
2023-04-14 17:10:19,490 INFO: batch_size:16 --> Calling Count:7
2023-04-14 17:10:19,490 INFO: cache_dataset:True    --> Calling Count:1
2023-04-14 17:10:19,490 INFO: checkpoint_save_mode:1    --> Calling Count:3
2023-04-14 17:10:19,490 INFO: cross_validate_fold:-1    --> Calling Count:1
2023-04-14 17:10:19,490 INFO: dataset_file:{'train': ['integrated_datasets/atepc_datasets/100.CustomDataset/custom.train.txt.atepc'], 'test': ['integrated_datasets/atepc_datasets/100.CustomDataset/custom.test.txt.atepc'], 'valid': []}  --> Calling Count:4
2023-04-14 17:10:19,490 INFO: dataset_name:custom_dataset   --> Calling Count:3
2023-04-14 17:10:19,490 INFO: device:cpu    --> Calling Count:6
2023-04-14 17:10:19,490 INFO: device_name:Unknown   --> Calling Count:1
2023-04-14 17:10:19,490 INFO: dropout:0.5   --> Calling Count:1
2023-04-14 17:10:19,490 INFO: dynamic_truncate:True --> Calling Count:220
2023-04-14 17:10:19,490 INFO: embed_dim:768 --> Calling Count:0
2023-04-14 17:10:19,490 INFO: evaluate_begin:0  --> Calling Count:0
2023-04-14 17:10:19,490 INFO: from_checkpoint:english   --> Calling Count:0
2023-04-14 17:10:19,490 INFO: gradient_accumulation_steps:1 --> Calling Count:4
2023-04-14 17:10:19,490 INFO: hidden_dim:768    --> Calling Count:6
2023-04-14 17:10:19,490 INFO: index_to_IOB_label:{1: 'B-ASP', 2: 'I-ASP', 3: 'O', 4: '[CLS]', 5: '[SEP]'}   --> Calling Count:0
2023-04-14 17:10:19,490 INFO: index_to_label:{0: 'Negative', 1: 'Neutral', 2: 'Positive'}   --> Calling Count:3
2023-04-14 17:10:19,490 INFO: inference_model:None  --> Calling Count:0
2023-04-14 17:10:19,490 INFO: initializer:xavier_uniform_   --> Calling Count:0
2023-04-14 17:10:19,490 INFO: l2reg:1e-05   --> Calling Count:2
2023-04-14 17:10:19,490 INFO: label_list:['B-ASP', 'O', '[CLS]', '[SEP]']   --> Calling Count:1
2023-04-14 17:10:19,490 INFO: label_to_index:{'negative': 0, 'positive': 1, 'Negative': 0, 'Neutral': 1, 'Positive': 2} --> Calling Count:1
2023-04-14 17:10:19,491 INFO: lcf:cdw   --> Calling Count:0
2023-04-14 17:10:19,491 INFO: learning_rate:2e-05   --> Calling Count:1
2023-04-14 17:10:19,491 INFO: load_aug:False    --> Calling Count:1
2023-04-14 17:10:19,491 INFO: log_step:-1   --> Calling Count:0
2023-04-14 17:10:19,491 INFO: logger:<Logger fast_lcf_atepc (INFO)> --> Calling Count:10
2023-04-14 17:10:19,491 INFO: max_seq_len:80    --> Calling Count:772
2023-04-14 17:10:19,491 INFO: model:<class 'pyabsa.tasks.AspectTermExtraction.models.__lcf__.fast_lcf_atepc.FAST_LCF_ATEPC'>    --> Calling Count:5
2023-04-14 17:10:19,491 INFO: model_name:fast_lcf_atepc --> Calling Count:112
2023-04-14 17:10:19,491 INFO: model_path_to_save:content    --> Calling Count:2
2023-04-14 17:10:19,491 INFO: num_epoch:1   --> Calling Count:2
2023-04-14 17:10:19,491 INFO: num_labels:5  --> Calling Count:3
2023-04-14 17:10:19,491 INFO: optimizer:adamw   --> Calling Count:2
2023-04-14 17:10:19,491 INFO: output_dim:3  --> Calling Count:1
2023-04-14 17:10:19,491 INFO: overwrite_cache:False --> Calling Count:1
2023-04-14 17:10:19,491 INFO: path_to_save:content  --> Calling Count:2
2023-04-14 17:10:19,491 INFO: patience:99999    --> Calling Count:0
2023-04-14 17:10:19,491 INFO: pretrained_bert:microsoft/deberta-v3-base --> Calling Count:6
2023-04-14 17:10:19,491 INFO: save_mode:1   --> Calling Count:0
2023-04-14 17:10:19,491 INFO: seed:52   --> Calling Count:7
2023-04-14 17:10:19,491 INFO: sep_indices:2 --> Calling Count:0
2023-04-14 17:10:19,491 INFO: show_metric:False --> Calling Count:0
2023-04-14 17:10:19,491 INFO: spacy_model:en_core_web_sm    --> Calling Count:3
2023-04-14 17:10:19,491 INFO: srd_alignment:True    --> Calling Count:0
2023-04-14 17:10:19,491 INFO: task_code:ATEPC   --> Calling Count:1
2023-04-14 17:10:19,491 INFO: task_name:Aspect Term Extraction and Polarity Classification  --> Calling Count:0
2023-04-14 17:10:19,491 INFO: tokenizer:DebertaV2TokenizerFast(name_or_path='microsoft/deberta-v3-base', vocab_size=128000, model_max_length=1000000000000000019884624838656, is_fast=True, padding_side='right', truncation_side='right', special_tokens={'bos_token': '[CLS]', 'eos_token': '[SEP]', 'unk_token': '[UNK]', 'sep_token': '[SEP]', 'pad_token': '[PAD]', 'cls_token': '[CLS]', 'mask_token': '[MASK]'}, clean_up_tokenization_spaces=True)  --> Calling Count:0
2023-04-14 17:10:19,491 INFO: use_amp:False --> Calling Count:1
2023-04-14 17:10:19,491 INFO: use_bert_spc:True --> Calling Count:0
2023-04-14 17:10:19,491 INFO: use_syntax_based_SRD:False    --> Calling Count:110
2023-04-14 17:10:19,491 INFO: warmup_step:-1    --> Calling Count:0
2023-04-14 17:10:19,491 INFO: window:lr --> Calling Count:0
[2023-04-14 17:10:20] (2.2.2) ********** Available ATEPC model checkpoints for Version:2.2.2 (this version) **********
[2023-04-14 17:10:20] (2.2.2) Downloading checkpoint:english 
[2023-04-14 17:10:20] (2.2.2) Notice: The pretrained model are used for testing, it is recommended to train the model on your own custom datasets
[2023-04-14 17:10:20] (2.2.2) Checkpoint already downloaded, skip
2023-04-14 17:10:20,144 INFO: Checkpoint downloaded at: checkpoints/ATEPC_ENGLISH_CHECKPOINT/fast_lcf_atepc_English_cdw_apcacc_82.36_apcf1_81.89_atef1_75.43
/home/amanda/Desktop/temp_model/venv/lib/python3.10/site-packages/pyabsa/framework/instructor_class/instructor_template.py:434: ResourceWarning: unclosed file <_io.BufferedReader name='checkpoints/ATEPC_ENGLISH_CHECKPOINT/fast_lcf_atepc_English_cdw_apcacc_82.36_apcf1_81.89_atef1_75.43/fast_lcf_atepc.config'>
  config = pickle.load(open(config_path[0], "rb"))
ResourceWarning: Enable tracemalloc to get the object allocation traceback
Traceback (most recent call last):
  File "/home/amanda/Desktop/temp_model/model.py", line 33, in <module>
    aspect_extractor = ATEPC.ATEPCTrainer(
  File "/home/amanda/Desktop/temp_model/venv/lib/python3.10/site-packages/pyabsa/tasks/AspectTermExtraction/trainer/atepc_trainer.py", line 64, in __init__
    self._run()
  File "/home/amanda/Desktop/temp_model/venv/lib/python3.10/site-packages/pyabsa/framework/trainer_class/trainer_template.py", line 241, in _run
    model_path.append(self.training_instructor(self.config).run())
  File "/home/amanda/Desktop/temp_model/venv/lib/python3.10/site-packages/pyabsa/tasks/AspectTermExtraction/instructor/atepc_instructor.py", line 794, in run
    return self._train(criterion=None)
  File "/home/amanda/Desktop/temp_model/venv/lib/python3.10/site-packages/pyabsa/framework/instructor_class/instructor_template.py", line 357, in _train
    self._resume_from_checkpoint()
  File "/home/amanda/Desktop/temp_model/venv/lib/python3.10/site-packages/pyabsa/framework/instructor_class/instructor_template.py", line 455, in _resume_from_checkpoint
    self.model.load_state_dict(
  File "/home/amanda/Desktop/temp_model/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 2041, in load_state_dict
    raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for FAST_LCF_ATEPC:
    size mismatch for classifier.weight: copying a param with shape torch.Size([6, 768]) from checkpoint, the shape in current model is torch.Size([5, 768]).
    size mismatch for classifier.bias: copying a param with shape torch.Size([6]) from checkpoint, the shape in current model is torch.Size([5]).

Describe the bug

Even with using dataset provided by default, there is size mismatch of torch.Size between This also happens when I try to train my custom dataset that I used convert_apc_set_to_atepc_set which created the atepc dataset succesfully with no errors.

Expected behavior

Train the model.

Screenshots

In addition to REQUIRED text-information, you can add screenshots to help explain your problem.

yangheng95 commented 1 year ago

Thanks you, just let you know I am working to fix it. Please be patient.

amandalim857 commented 1 year ago

Thank you very much!

yangheng95 commented 1 year ago

I see: label_to_index:{'negative': 0, 'positive': 1, 'Negative': 0, 'Neutral': 1, 'Positive': 2} --> Calling Count:1

Please rename your dataset lalbels to be 'Negative', 'Neutral', 'Positive'. Or you can need to remove the from_checkpint param which has different labels

flora-zyx commented 7 months ago

I'm facing similar issues. My dataset only contains 'Positive' and 'Negative' labels, and I still got the 'Target -100 is out of bounds.'. Is there any potential fix for this error?

My label_to_index is like this:

label_list:['B-ASP', 'I-ASP', 'O', '[CLS]', '[SEP]']    --> Calling Count:1
label_to_index:{'-100': -100, 'Negative': 0, 'Positive': 1} --> Calling Count:0

yangheng95 / PyABSA