Open dnanhkhoa opened 4 years ago
Hi @dnanhkhoa, thanks for noticing the issue. We will update the evaluation to handle this and notify you.
Thank you for the reply, hope that I will get the update soon. Thanks!
While waiting you can simply use the classification_report, precision_recall_fscore_support
from sklearn.metrics
as below:
print(classification_report(y_test, pred_test, target_names=cfg.RELATIONS, labels=range(len(cfg.RELATIONS))))
print("Macro", precision_recall_fscore_support(y_test, pred_test, average='macro', labels=range(len(cfg.RELATIONS))))
print("Micro", precision_recall_fscore_support(y_test, pred_test, average='micro', labels=range(len(cfg.RELATIONS))))
here. y_test are the gold labels and pred_test are the predicted labels.
Hello @jeniyat,
I have just found a critical problem in your evaluation script (which will be used to evaluate in the shared task).
For example, In
protocol_138.ann
, it includes 7 events (E13, E19, E47, E68, E5, E50, E4) with single relation roleCommands
in their arguments. So theoretically the evaluation result should show 7 in the support column of theCommands
role, but I got 6 (see the image)I went deeper and found that the missing come from E19
Since E9 is below E19, so when the reader processes E19, it doesn't know what is E9 to extract the entity tag (https://github.com/jeniyat/WNUT_2020_RE/blob/master/code/corpus/ProtoFile.py#L412), leads to missing relation
Commands Arg1:T19 Arg2:T140
. This problem could treat the correct predicted relation from someone's model as false positive -> incorrect precision. The order of annotation must not affect the result.I tried a simple fix by moving E9 above E19 and it worked as I expected (see the image, the support count is now 7)
But that way doesn't fix the root problem, it needs to be fixed in the reader code (probably you should create a mapping Event -> Entity before converting events to relations).