kevinscaria / InstructABSA

Instructional learning for Aspect Based Sentiment Analysis [NAACL-2024]
https://aclanthology.org/2024.naacl-short.63/
MIT License
147 stars 24 forks source link

Evaluation Metrics seem to be over-counting/ inflating the counts of true positives? #28

Open keanepotato opened 2 weeks ago

keanepotato commented 2 weeks ago

Hi Kevin,

Thanks for your contribution to the ABSA task. I just wanted to bring your attention to the following code block in your utils.py file within the InstructABSA folder. Seems that because each matched prediction isn't removed from the pred_val list, in the case where gt_val contains repeated instances ['food', 'food'], but pred_val contains only one instance ['food'], the model is considered to predict all instances correctly, despite missing out the second 'food'?

def get_metrics(self, y_true, y_pred, is_triplet_extraction=False): total_pred = 0 total_gt = 0 tp = 0 if not is_triplet_extraction: for gt, pred in zip(y_true, y_pred): gt_list = gt.split(', ') pred_list = pred.split(', ') total_pred+=len(pred_list) total_gt+=len(gt_list) for gt_val in gt_list: for pred_val in pred_list: if pred_val in gt_val or gt_val in pred_val: tp+=1 break

    else:
        for gt, pred in zip(y_true, y_pred):
            gt_list = gt.split(', ')
            pred_list = pred.split(', ')
            total_pred+=len(pred_list)
            total_gt+=len(gt_list)
            for gt_val in gt_list:
                gt_asp = gt_val.split(':')[0]

                try:
                    gt_op = gt_val.split(':')[1]
                except:
                    continue

                try:
                    gt_sent = gt_val.split(':')[2]
                except:
                    continue

                for pred_val in pred_list:
                    pr_asp = pred_val.split(':')[0]

                    try:
                        pr_op = pred_val.split(':')[1]
                    except:
                        continue
keanepotato commented 2 weeks ago

Additionally, I believe this doesn't count the exact matches of aspects, but rather only if a string is contained within another string.

So, in the case where the model matches "of", but the ground truth is "bowl of sushi", this is marked as a true positive as well.