Hi, many thanks for your great work on ToDKat. It helps us a lot.
However, I found that you made mistakes on all weighted F1 calculations in all experiments via metrics.f1_score, e.g.
https://github.com/something678/TodKat/blob/abf6a13b8f00246773a25c6fde352a3ef3925015/src/DialogEvaluator_meld.py#L145
In scikit doc, f1_score takes y_true first, then y_pred. And the weights is based on y_true. In your exps, you put pred_list first...
I downloaded your pretrained weights, and if this bug is fixed, the weighted F1 drops from 68.23 to 61.28.
Logs for reference:
Before fixing the bug:
2021-12-16 11:46:30 - Accuracy: 0.6475 (1690/2610)
2021-12-16 11:46:30 - Weighted F1-macro with neutral: 0.6823 (1690/2610)
2021-12-16 11:46:30 - F1-micro with neutral: 0.6475 (1690/2610)
After fixing the bug:
2021-12-16 11:48:32 - Accuracy: 0.6475 (1690/2610)
2021-12-16 11:48:32 - Weighted F1-macro with neutral: 0.6128 (1690/2610)
2021-12-16 11:48:32 - F1-micro with neutral: 0.6475 (1690/2610)
Hi, many thanks for your great work on ToDKat. It helps us a lot. However, I found that you made mistakes on all weighted F1 calculations in all experiments via
metrics.f1_score
, e.g. https://github.com/something678/TodKat/blob/abf6a13b8f00246773a25c6fde352a3ef3925015/src/DialogEvaluator_meld.py#L145 In scikit doc,f1_score
takesy_true
first, theny_pred
. And the weights is based ony_true
. In your exps, you putpred_list
first... I downloaded your pretrained weights, and if this bug is fixed, the weighted F1 drops from 68.23 to 61.28.Logs for reference: Before fixing the bug: 2021-12-16 11:46:30 - Accuracy: 0.6475 (1690/2610) 2021-12-16 11:46:30 - Weighted F1-macro with neutral: 0.6823 (1690/2610) 2021-12-16 11:46:30 - F1-micro with neutral: 0.6475 (1690/2610)
After fixing the bug: 2021-12-16 11:48:32 - Accuracy: 0.6475 (1690/2610) 2021-12-16 11:48:32 - Weighted F1-macro with neutral: 0.6128 (1690/2610) 2021-12-16 11:48:32 - F1-micro with neutral: 0.6475 (1690/2610)