yangheng95 / PyABSA

Sentiment Analysis, Text Classification, Text Augmentation, Text Adversarial defense, etc.;
https://pyabsa.readthedocs.io
MIT License
956 stars 162 forks source link

TypeError: 'NoneType' object is not iterable - Train custom dataset from english checkpoint #344

Closed sorin-simu closed 1 year ago

sorin-simu commented 1 year ago

Unless you provide the REQUIRED information, your problem may not be addressed.

PyABSA Version (Required)

PyABSA: 2.3.1,Torch: 2.0.1,Transformers: 4.29.0

ABSADataset Version (Required if you use integrated datasets)

My custom dataset

Code To Reproduce (Required)

dataset = 'MyDataset'

trainer = ATEPC.ATEPCTrainer( config, dataset = dataset, from_checkpoint="english", auto_device = DeviceTypeOption.AUTO, checkpoint_save_mode=ModelSaveOption.SAVE_MODEL_STATE_DICT, load_aug=False )

Full Console Output (Required)

C:\Users\username\PycharmProjects\pythonProject\venv\Scripts\python.exe C:\Users\username\PycharmProjects\pythonProject\train.py No CUDA GPU found in your device [2023-08-02 19:33:46] (2.3.1) PyABSA(2.3.1): If your code crashes on Colab, please use the GPU runtime. Then run "pip install pyabsa[dev] -U" and restart the kernel. Or if it does not work, you can use v1.16.27

[New Feature] Aspect Sentiment Triplet Extraction since v2.1.0 (https://github.com/yangheng95/PyABSA/tree/v2/examples-v2/aspect_sentiment_triplet_extration) [New Feature] Aspect CategoryOpinion Sentiment Quadruple Extraction since v2.2.0 (https://github.com/yangheng95/PyABSA/tree/v2/examples-v2/aspect_opinion_sentiment_category_extraction)

No CUDA GPU found in your device [2023-08-02 19:33:51] (2.3.1) PyABSA(2.3.1): If your code crashes on Colab, please use the GPU runtime. Then run "pip install pyabsa[dev] -U" and restart the kernel. Or if it does not work, you can use v1.16.27

[New Feature] Aspect Sentiment Triplet Extraction since v2.1.0 (https://github.com/yangheng95/PyABSA/tree/v2/examples-v2/aspect_sentiment_triplet_extration) [New Feature] Aspect CategoryOpinion Sentiment Quadruple Extraction since v2.2.0 (https://github.com/yangheng95/PyABSA/tree/v2/examples-v2/aspect_opinion_sentiment_category_extraction)

[2023-08-02 19:33:52] (2.3.1) Datasets already exist in C:\Users\username\PycharmProjects\pythonProject\integrated_datasets, skip download [2023-08-02 19:33:52] (2.3.1) Set Model Device: cpu [2023-08-02 19:33:52] (2.3.1) Device Name: Unknown 2023-08-02 19:33:52,497 INFO: PyABSA version: 2.3.1 2023-08-02 19:33:52,497 INFO: Transformers version: 4.29.0 2023-08-02 19:33:52,497 INFO: Torch version: 2.0.1+cpu+cudaNone 2023-08-02 19:33:52,497 INFO: Device: Unknown 2023-08-02 19:33:52,497 INFO: MyDataset in the trainer is not a exact path, will search dataset in current working directory FindFile Warning --> multiple targets ['Backup\MyDataset\fast_lcf_atepc_custom_dataset_0.07486201077699661', 'Backup\MyDataset\fast_lcf_atepc_custom_dataset_0.0830550491809845', 'Backup\MyDataset\fast_lcf_atepc_custom_dataset_0.08690499514341354', 'Backup\MyDataset\fast_lcf_atepc_custom_dataset_0.11740821599960327', 'Backup\MyDataset\fast_lcf_atepc_custom_dataset_0.1491866111755371', 'Backup\MyDataset\fast_lcf_atepc_custom_dataset_0.1784743368625641', 'Backup\MyDataset\fast_lcf_atepc_custom_dataset_0.36736223101615906', 'Backup\MyDataset\fast_lcf_atepc_custom_dataset_0.40988689661026', 'Backup\MyDataset\fast_lcf_atepc_custom_dataset_0.5526915192604065', 'datasets\atepc_datasets\999.MyDataset', 'integrated_datasets\atepc_datasets\999.MyDataset'] found, only return the shortest path: <datasets\atepc_datasets\999.MyDataset> 2023-08-02 19:33:53,043 INFO: You can set load_aug=True in a trainer to augment your dataset (English only yet) and improve performance. 2023-08-02 19:33:53,045 INFO: Warning! auto_evaluate=True, however cannot find test set using for evaluating! 2023-08-02 19:33:53,045 INFO: Please DO NOT mix datasets with different sentiment labels for trainer & inference ! Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. C:\Users\username\PycharmProjects\pythonProject\venv\Lib\site-packages\transformers\convert_slow_tokenizer.py:454: UserWarning: The sentencepiece tokenizer that you are converting to a fast tokenizer uses the byte fallback option which is not implemented in the fast tokenizers. In practice this means that the fast version of the tokenizer can produce unknown tokens whereas the sentencepiece version would have converted these unknown tokens into a sequence of byte tokens matching the original piece of text. warnings.warn( Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. convert examples to features: 67%|██████▋ | 361/539 [00:01<00:00, 338.55it/s]C:\Users\username\AppData\Local\Programs\Python\Python311\Lib\multiprocessing\pool.py:268: ResourceWarning: unclosed running multiprocessing pool _warn(f"unclosed running multiprocessing pool {self!r}", ResourceWarning: Enable tracemalloc to get the object allocation traceback [2023-08-02 19:33:57] (2.3.1) Datasets already exist in C:\Users\username\PycharmProjects\pythonProject\integrated_datasets, skip download [2023-08-02 19:33:57] (2.3.1) Set Model Device: cpu [2023-08-02 19:33:57] (2.3.1) Device Name: Unknown 2023-08-02 19:33:57,306 INFO: PyABSA version: 2.3.1 2023-08-02 19:33:57,307 INFO: Transformers version: 4.29.0 2023-08-02 19:33:57,307 INFO: Torch version: 2.0.1+cpu+cudaNone 2023-08-02 19:33:57,307 INFO: Device: Unknown 2023-08-02 19:33:57,307 INFO: MyDataset in the trainer is not a exact path, will search dataset in current working directory FindFile Warning --> multiple targets ['Backup\MyDataset\fast_lcf_atepc_custom_dataset_0.07486201077699661', 'Backup\MyDataset\fast_lcf_atepc_custom_dataset_0.0830550491809845', 'Backup\MyDataset\fast_lcf_atepc_custom_dataset_0.08690499514341354', 'Backup\MyDataset\fast_lcf_atepc_custom_dataset_0.11740821599960327', 'Backup\MyDataset\fast_lcf_atepc_custom_dataset_0.1491866111755371', 'Backup\MyDataset\fast_lcf_atepc_custom_dataset_0.1784743368625641', 'Backup\MyDataset\fast_lcf_atepc_custom_dataset_0.36736223101615906', 'Backup\MyDataset\fast_lcf_atepc_custom_dataset_0.40988689661026', 'Backup\MyDataset\fast_lcf_atepc_custom_dataset_0.5526915192604065', 'datasets\atepc_datasets\999.MyDataset', 'integrated_datasets\atepc_datasets\999.MyDataset'] found, only return the shortest path: <datasets\atepc_datasets\999.MyDataset> 2023-08-02 19:33:57,923 INFO: You can set load_aug=True in a trainer to augment your dataset (English only yet) and improve performance. 2023-08-02 19:33:57,924 INFO: Warning! auto_evaluate=True, however cannot find test set using for evaluating! 2023-08-02 19:33:57,925 INFO: Please DO NOT mix datasets with different sentiment labels for trainer & inference ! Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. C:\Users\username\PycharmProjects\pythonProject\venv\Lib\site-packages\transformers\convert_slow_tokenizer.py:454: UserWarning: The sentencepiece tokenizer that you are converting to a fast tokenizer uses the byte fallback option which is not implemented in the fast tokenizers. In practice this means that the fast version of the tokenizer can produce unknown tokens whereas the sentencepiece version would have converted these unknown tokens into a sequence of byte tokens matching the original piece of text. warnings.warn( Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. convert examples to features: 100%|██████████| 539/539 [00:01<00:00, 298.68it/s] 2023-08-02 19:34:02,076 INFO: Dataset Label Details: {'Neutral': 39, 'Negative': 83, 'Positive': 417, 'Sum': 539} C:\Users\username\PycharmProjects\pythonProject\venv\Lib\site-packages\pyabsa\tasks\AspectTermExtraction\instructor\atepc_instructor.py:85: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at ..\torch\csrc\utils\tensor_new.cpp:248.) lcf_cdm_vec = torch.tensor( Some weights of the model checkpoint at microsoft/deberta-v3-base were not used when initializing DebertaV2Model: ['mask_predictions.dense.weight', 'lm_predictions.lm_head.LayerNorm.bias', 'lm_predictions.lm_head.dense.bias', 'lm_predictions.lm_head.dense.weight', 'mask_predictions.LayerNorm.weight', 'mask_predictions.LayerNorm.bias', 'lm_predictions.lm_head.LayerNorm.weight', 'mask_predictions.classifier.bias', 'lm_predictions.lm_head.bias', 'mask_predictions.classifier.weight', 'mask_predictions.dense.bias']

Process finished with exit code 1

Describe the bug

I've created a custom atepc dataset and I want to train it from the english checkpoint. After Epoch 9 I get the error described in here, but I still see in the checkpoints folder the new custom dataset checkpoint (see the screenshot below).

  1. How can I fix this issue ?
  2. Can I use the resulted checkpoint in this conditions for predictions ?

Thanks for your work and consideration.

Screenshots

image image image

yangheng95 commented 1 year ago

You need to check if you got a test set. Otherwise you can split a part of train set as test set