unikcc / DiaASQ

ACL 2023 (Findings) : DiaASQ: A Benchmark of Conversational Aspect-based Sentiment Quadruple Analysis
https://diaasq-page.pages.dev/
MIT License
53 stars 14 forks source link

Consulting for the evalution metric on cross-utterance quadruple. #9

Closed Git-Shaw closed 1 year ago

Git-Shaw commented 1 year ago

Hello, there is an experiment perform on cross-utterance quadruple in the paper, and I noticed that it uses micro f1 metric.

I have identified the quads of cross-utterance, and the number is consistent with Table 2 in the paper.

But I couldn't reproduced the scores in the fig.6 of paper. I'm confused about what pred and gold refer to respectively according to the code. For example, there is a dialog contains both cross- and uncross-utterance quadruple, if the uncross one should be included in the preds or golds?

def compute_score(self, preds, golds, mode='quad'):

        # assert all(doc_id in golds for doc_id in preds)
        # assert all(doc_id in preds for doc_id in golds)
        tp, fp, fn = 0, 0, 0
        for doc_id in preds:
            pred_line = preds[doc_id]
            gold_line = golds[doc_id]

            if mode != 'quad':
                pred_line = [w[:6] for w in pred_line]
                gold_line = [w[:6] for w in gold_line]

            fp += len(set(pred_line) - set(gold_line))
            fn += len(set(gold_line) - set(pred_line))
            tp += len(set(pred_line) & set(gold_line))

        p = tp / (tp + fp) if tp + fp > 0 else 0
        r = tp / (tp + fn) if tp + fn > 0 else 0
        f1 = 2 * p * r / (p + r) if p + r > 0 else 0
        scores = [p, r, f1, tp, tp + fp, tp + fn]
        return scores

Could you give me some guidence or the evalution code about calculating the cross-utterance micro-f1 plz? I think it is also helpful for others.

Thank you in advance!

unikcc commented 1 year ago

Hi, thank you for your interest, and apologies for the late reply.

Regarding your question, "For example, there is a dialog that contains both cross- and uncross-utterance quadruples. Should the uncross ones be included in the preds or golds?":

The answer is that when computing the cross-utterance metrics, we only consider the cross-utterance quadruples in both the preds and golds sets.

We have supplemented the evaluation code to handle cross-utterance quadruples. FYI.