yangheng95 / PyABSA

Sentiment Analysis, Text Classification, Text Augmentation, Text Adversarial defense, etc.;
https://pyabsa.readthedocs.io
MIT License
930 stars 159 forks source link

# Inference Results not as expected when Applying Custom Trained Model #308

Open christianjosef27 opened 1 year ago

christianjosef27 commented 1 year ago

PyABSA Version 2.2.2

After Training I tried interference with my custom model (Trainer showed good eval metrics):

aspect_extractor = ATEPC.AspectExtractor('fast_lcf_atepc_custom_dataset_cdw_apcacc_85.8_apcf1_85.73_atef1_80.08', auto_device=True, # False means load model on CPU cal_perplexity=True, )

Description of Issue:

I have tried the base multilingual model via batch_predict() and passed reviews as list of strings and the results were already decent. Then I trained the model on my own custom dataset which I have annotated by my own.

With Data Augmentation Techniques etc. I could reach metrics of APCacc 85%,, APCf1 85% and ATEf1 80%. However, after applying my model as shown in the upper code and used the same input data (list of strings) the results were not as expected.

Barely any aspects were extracted etc.... Much worse than with the base model.

Could you please help me out on this? Did I miss something? Why was the metrics evaluation good but inference not at all? The inference data is same format as training data.

Screenshots

Inference without Trained Model:

image

Inference with Trained Model:

image

christianjosef27 commented 1 year ago

PS: The Examples from the Screenshots were even included in the Training Data.

christianjosef27 commented 1 year ago

I assume I made a mistake when trying to Load My Model.

christianjosef27 commented 1 year ago

Still same issue, am I loading my checkpoint correctly or do I need to register my checkpoint in pyabsa?

I loaded the state_dict of my custom trained model like that:

from pyabsa import ATEPCCheckpointManager

checkpoint_name = (r'C:\Users\chris\Documents\A_Masterarbeit\Prepare Train Dataset\Modeling\saved_trained_models\fast_lcf_atepc_custom_dataset_cdw_apcacc_85.8_apcf1_85.73_atef1_80.08')

sent_classifier = ATEPCCheckpointManager.get_aspect_extractor(checkpoint=checkpoint_name)

Example sentences from the Training Data (in "Clean_Data_Aug_4" die Rows 1581 - 1590 incl.)

train_examples = ['Der Studiengang ist gut strukturiert man hatt viele interessante Lehrveranstaltungen die von vielen Dozenten gut vorbeitet werden und man dadurch den Inhalt sehr gut versteht.', 'Man lernt teilweise Automatisch mit, wenn man Medien und aktuelle Geschehnisse verfolgt.', 'Lektoren jedoch zum großen Teil freundlich und zuvorkommend.', 'in Vorlesungen und Prüfungen also Online!', 'Viele Organisationen verweigern dass hin und UNK her bzw springen bei öffentlichen Fragen Multiple A Choice genannten Prüfungen .', 'UNK Die Umsetzung der sogenannten Hybridlehre ist der UNI gut gelungen, ich insbesondere würde mir jedoch wünschen, dass u. a auch open vor book Klausuren oder online Prüfungen eingeführt worden wären, wie es das auch an vielen anderen anderen Hochschulen der Fall schon war.', 'Sehr gut, es wurde gleich alles auf OnlineLehre oder hybride Lehre umgestellt und gut organisiert.', 'Struktur manchmal etwas unklar.', 'Da das Institut für Philosophie an der Uni Wien sehr groß ist können ca. Im Durchschnitt 1 von 5 verschiedenen Kursen gewählt werden.', 'Ich kann das Studium sehr empfehlen, auch wenn in manchen Fächern die Organisation etwas zu wünschen lässt, aber durch die interessanten Inhalte wird dies ausgeglichen.']

atepc_result = sent_classifier.extract_aspect(inference_source=train_examples, # pred_sentiment=True, # Predict the sentiment of extracted aspect terms ) atepc_result print()

However, the results still are not as expected.

Even the training examples were predicted completely wrong.

The Trainer Evaluation of the metrics (apc_acc, apc_f1 and ate_f1) were however very good (85 %).

image

What am I doing wrong? I would appreciate your help

yangheng95 commented 1 year ago

You can send me your checkpoint by email

christianjosef27 commented 1 year ago

Hi, did you receive my email with the Google Drive Link to my trained checkpoint?

 

Best Regards,

Christian

   

Gesendet: Freitag, 21. April 2023 um 13:49 Uhr Von: "Heng Yang" @.> An: "yangheng95/PyABSA" @.> Cc: "christianjosef27" @.>, "Author" @.> Betreff: Re: [yangheng95/PyABSA] # Inference Results not as expected when Applying Custom Trained Model (Issue #308)

 

You can send me your checkpoint by email

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

yangheng95 commented 1 year ago

No, which email adress did you send to?

christianjosef27 commented 1 year ago

Maybe I used a wrong address. 

I just sent again to this address: 

***@***.***

 

 

   

Gesendet: Montag, 24. April 2023 um 15:26 Uhr Von: "Heng Yang" @.> An: "yangheng95/PyABSA" @.> Cc: "christianjosef27" @.>, "Author" @.> Betreff: Re: [yangheng95/PyABSA] # Inference Results not as expected when Applying Custom Trained Model (Issue #308)

 

No, which email adress did you send to?

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

yangheng95 commented 1 year ago

You can set

config.pretrained_bert='microsoft/mdeberta-v3-base'
config.use_amp=False

to retrain the multilingual model

Note that resume from checkpoint option in a trainer is generally not necessary and may cause problems if anyone is not familiar with it.

christianjosef27 commented 1 year ago

Thank your for your help! Does that mean that you think I have trained my model wrong? Can this be the reason for the unexpected prediction results? As mentioned the point I am not able to proceed is understanding why the metric evaluation was good, 85%, but the prediction inteference couldn't even match with the training samples.-- Diese Nachricht wurde von meinem Android Mobiltelefon mit WEB.DE Mail gesendet.Am 25.04.23, 12:32 schrieb Heng Yang @.***>:

You can set config.pretrained_bert='microsoft/mdeberta-v3-base' config.use_amp=False

to retrain the multilingual model —Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you authored the thread.Message ID: @.***>

yangheng95 commented 1 year ago

There is a bug in multilingual-bert in this case, I suggest to avoid it instead of fixing it

christianjosef27 commented 1 year ago

Ahh okay that means with that alternative config setting I would not fine tune the multilingual model but a "previous version" of it?And then I should obtain the at least more expecting results reflecting the training metrics right? -- Diese Nachricht wurde von meinem Android Mobiltelefon mit WEB.DE Mail gesendet.Am 25.04.23, 13:49 schrieb Heng Yang @.***>:

There is a bug in multilingual-bert, I suggest to alleviate it instead of fixing it —Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you authored the thread.Message ID: @.***>

yangheng95 commented 1 year ago

I don't get it. But you can just train a new model using the following code example:

import random

from pyabsa import AspectTermExtraction as ATEPC

config = ATEPC.ATEPCConfigManager.get_atepc_config_chinese()
config.model = ATEPC.ATEPCModelList.FAST_LCF_ATEPC
config.evaluate_begin = 0
config.max_seq_len = 128
config.batch_size = 16
# config.pretrained_bert = 'yangheng/deberta-v3-base-absa'
config.pretrained_bert = "microsoft/mdeberta-v3-base"
config.log_step = -1
config.l2reg = 1e-8
config.num_epoch = 20
config.seed = 42
config.use_bert_spc = True
config.use_amp = False
config.cache_dataset = True
config.cross_validate_fold = -1

# chinese_sets = ATEPC.ATEPCDatasetList.Chinese_Zhang
chinese_sets = ATEPC.ATEPCDatasetList.Multilingual

aspect_extractor = ATEPC.ATEPCTrainer(
    config=config,
    # from_checkpoint="",   # not necessary for most situations
    dataset=chinese_sets,
    checkpoint_save_mode=1,
    auto_device=True,
    load_aug=False,
).load_trained_model()

Then I think it shall work.

christianjosef27 commented 1 year ago

Hello Heng,

 

in brief the issue I am having is that I custom trained your multilingual model on my custom data and the metrics evaluation was great, however, after testing my trained model on an example list of strings (my production data) the results were unlogically horrible and did not reflect the good metrics.

 

Either I have not loaded the trained model correctly or something else. I'd appreciate your help, maybe your suggestion of training again with different config solves my issue. But I wanted to make my words clear so you understand better my issue.

 

 

I did not use "trainer.load_trained_model()" but I loaded my model like this:

 

aspect_extractor = ATEPC.AspectExtractor('fast_lcf_atepc_custom_dataset_cdw_apcacc_85.8_apcf1_85.73_atef1_80.08',                                          auto_device=True,  # False means load model on CPU                                          cal_perplexity=True,                                          )

 

 

 

This is the fast_lcf_atepc.args file of my model:

 

model: <class 'pyabsa.tasks.AspectTermExtraction.models.lcf.fast_lcf_atepc.FAST_LCF_ATEPC'> optimizer: adamw learning_rate: 2e-05 cache_dataset: True warmup_step: -1 use_bert_spc: True max_seq_len: 80 SRD: 3 use_syntax_based_SRD: False lcf: cdw dropout: 0.5 l2reg: 1e-05 num_epoch: 10 batch_size: 16 seed: 1 output_dim: 3 log_step: 662 patience: 2 gradient_accumulation_steps: 1 dynamic_truncate: True evaluate_begin: 0 use_amp: False cross_validate_fold: -1 pretrained_bert: bert-base-multilingual-uncased verbose: True dataset: C:/Users/chris/Documents/A_Masterarbeit/Prepare Train Dataset/Modeling/integrated_datasets/atepc_datasets/177.University from_checkpoint: None checkpoint_save_mode: 1 auto_device: True path_to_save: C:/Users/chris/Documents/A_Masterarbeit/Prepare Train Dataset/Modeling/saved_trained_models load_aug: False device: cpu device_name: Unknown model_name: fast_lcf_atepc hidden_dim: 768 PyABSAVersion: 2.2.2 TransformersVersion: 4.25.1 TorchVersion: 1.13.0+cu116+cuda11.6 dataset_name: custom_dataset save_mode: 1 logger: <Logger fast_lcf_atepc (INFO)> task_code: ATEPC task_name: Aspect Term Extraction and Polarity Classification dataset_file: {'train': ['C:/Users/chris/Documents/A_Masterarbeit/Prepare Train Dataset/Modeling/integrated_datasets/atepc_datasets/177.University\train.apc.dataset.txt.atepc', 'C:/Users/chris/Documents/A_Masterarbeit/Prepare Train Dataset/Modeling/integrated_datasets/atepc_datasets/177.University\val.apc.dataset.txt.atepc'], 'test': ['C:/Users/chris/Documents/A_Masterarbeit/Prepare Train Dataset/Modeling/integrated_datasets/atepc_datasets/177.University\test.apc.dataset.txt.atepc'], 'valid': []} model_path_to_save: C:/Users/chris/Documents/A_Masterarbeit/Prepare Train Dataset/Modeling/saved_trained_models spacy_model: en_core_web_sm IOB_label_to_index: {'B-ASP': 1, 'I-ASP': 2, 'O': 3, '[CLS]': 4, '[SEP]': 5} index_to_label: {0: 'negative', 1: 'neutral', 2: 'positive'} label_list: ['B-ASP', 'I-ASP', 'O', '[CLS]', '[SEP]'] num_labels: 6 sep_indices: 102 max_test_metrics: {'max_apc_test_acc': 85.8, 'max_apc_test_f1': 85.73, 'max_ate_test_f1': 81.92} metrics_of_this_checkpoint: {'apc_acc': 85.8, 'apc_f1': 85.73, 'ate_f1': 80.08}  

Gesendet: Dienstag, 25. April 2023 um 14:02 Uhr Von: "Heng Yang" @.> An: "yangheng95/PyABSA" @.> Cc: "christianjosef27" @.>, "Author" @.> Betreff: Re: [yangheng95/PyABSA] # Inference Results not as expected when Applying Custom Trained Model (Issue #308)

 

I don't get it, please consider making your words clear. But you can just train a new model using the following code example:

import random

from pyabsa import AspectTermExtraction as ATEPC

config = ATEPC.ATEPCConfigManager.get_atepc_config_chinese() config.model = ATEPC.ATEPCModelList.FAST_LCF_ATEPC config.evaluate_begin = 0 config.max_seq_len = 128 config.batch_size = 16

config.pretrained_bert = 'yangheng/deberta-v3-base-absa'

config.pretrained_bert = "microsoft/mdeberta-v3-base" config.log_step = -1 config.l2reg = 1e-8 config.num_epoch = 20 config.seed = 42 config.use_bert_spc = True config.use_amp = False config.cache_dataset = True config.cross_validate_fold = -1

chinese_sets = ATEPC.ATEPCDatasetList.Chinese_Zhang

chinese_sets = ATEPC.ATEPCDatasetList.Multilingual

aspect_extractor = ATEPC.ATEPCTrainer( config=config,

from_checkpoint="", # not necessary for most situations

dataset=chinese_sets,
checkpoint_save_mode=1,
auto_device=True,
load_aug=False,

).load_trained_model()

Then I think it shall work.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

christianjosef27 commented 1 year ago

Hello, I just started to start that training a new model with your code example. But it seems it would literally take almost a month of running time with my laptop.

 

Please could you give me a hint how to proceed with my situation:

I trained the multilingual model on my custom dataset which showed good evaluation metrics (85% etc.). However, I do not know how to load the created custom checkpoints to test the trained model for inference. (I have the automatically created folders containing the checkpoints, created during Training).

I tried loading my trained checkpoint like this:

checkpoint_name = r'C:\Users\chris\Documents\A_Masterarbeit\Prepare Train Dataset\Modeling\fast_lcf_atepc_custom_dataset_cdw_apcacc_85.8_apcf1_85.73_atef1_80.08'

Then I used batch_predict() to test it on a list of strings. But the results did not reflect the metric evaluation at all. Even could not extract the expected aspects/and sentiments from many training examples.

Gesendet: Dienstag, 25. April 2023 um 14:02 Uhr Von: "Heng Yang" @.> An: "yangheng95/PyABSA" @.> Cc: "christianjosef27" @.>, "Author" @.> Betreff: Re: [yangheng95/PyABSA] # Inference Results not as expected when Applying Custom Trained Model (Issue #308)

 

I don't get it, please consider making your words clear. But you can just train a new model using the following code example:

import random

from pyabsa import AspectTermExtraction as ATEPC

config = ATEPC.ATEPCConfigManager.get_atepc_config_chinese() config.model = ATEPC.ATEPCModelList.FAST_LCF_ATEPC config.evaluate_begin = 0 config.max_seq_len = 128 config.batch_size = 16

config.pretrained_bert = 'yangheng/deberta-v3-base-absa'

config.pretrained_bert = "microsoft/mdeberta-v3-base" config.log_step = -1 config.l2reg = 1e-8 config.num_epoch = 20 config.seed = 42 config.use_bert_spc = True config.use_amp = False config.cache_dataset = True config.cross_validate_fold = -1

chinese_sets = ATEPC.ATEPCDatasetList.Chinese_Zhang

chinese_sets = ATEPC.ATEPCDatasetList.Multilingual

aspect_extractor = ATEPC.ATEPCTrainer( config=config,

from_checkpoint="", # not necessary for most situations

dataset=chinese_sets,
checkpoint_save_mode=1,
auto_device=True,
load_aug=False,

).load_trained_model()

Then I think it shall work.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>