yangheng95 / PyABSA

Sentiment Analysis, Text Classification, Text Augmentation, Text Adversarial defense, etc.;
https://pyabsa.readthedocs.io
MIT License
929 stars 159 forks source link

Performance measures test data FAST_LCF checkpoint model #209

Open KarstenLasse opened 2 years ago

KarstenLasse commented 2 years ago

Dear @yangheng95,

Thanks for making and maintaining this repo, it's great!

I have some trouble to get the accuracy and F1 scores for the Restaurant Test data Gold. (Ideally I want to make a confusion matrix). What is the easiest way to get F1 scores for APC & ATE after running a checkpoint model on test data? Does the model store these metrics somewhere?

Alternatively, how do you compare your predictions to the TRUE test data (Restaurant Test data Gold annotated)? I can easily transform the models' predictions ('atepc_inference.result_json') to a pandas dataframe. But it is very hard to transform the test data stored in integrated datasets (from ABSAdatasets) (it is in IOB format) to that exact same format (pandas dataframe) in order to test performance. Do you have a script for that, or a certain function? I was not able to find it.

Btw: I used the multilingual checkpoint model (FAST-LCF-ATEPC) on the Restaurant14 Test data Gold (But, ultimately I want to use this model on Dutch data. That is why I want to know how to test performance).

Thanks a lot,

Karsten

Code:

import pyabsa as pyabsa

from pyabsa import available_checkpoints
# The results of available_checkpoints() depend on the PyABSA version
checkpoint_map = available_checkpoints()  # show available checkpoints of PyABSA of current version 

from pyabsa.functional import ABSADatasetList
from pyabsa.functional import ATEPCCheckpointManager
inference_source = ABSADatasetList.Restaurant14
aspect_extractor = ATEPCCheckpointManager.get_aspect_extractor(checkpoint='multilingual')
atepc_result = aspect_extractor.extract_aspect(inference_source=inference_source,
                                               save_result=True,
                                               print_result=True,  # print the result
                                               pred_sentiment=True,  # Predict the sentiment of extracted aspect terms
                                               )

import pandas as pd
df_restaurant_EN_test_pred = pd.read_json('atepc_inference.result_EN.json')
yangheng95 commented 2 years ago

Hi, @KarstenLasse The evaluation of ATEPC inference is not available now, I will work on it later.

yangheng95 commented 2 years ago

Because the evaluation of APC is based on the results of ATE, so it is different. However there is a Dutch dataset here to evalute the peroformance in training.

DorisFangWork commented 2 years ago

Hi, @KarstenLasse The evaluation of ATEPC inference is not available now, I will work on it later.

Really appreciate it, I also look forward the evaluation :)

alfadoraflyh commented 1 year ago

is there any update on this problem? thank you.