jbrry / Irish-BERT

Repository to store helper scripts for creating an Irish BERT model.
Other
9 stars 0 forks source link

report statistical power of test sets #109

Open jowagner opened 2 years ago

jowagner commented 2 years ago

Shimorina and Belz 2022 The Human Evaluation Datasheet: A Template for Recording Details of Human Evaluation Experiments in NLP ask to report statistical power of the data sample used in human evaluation. This is particularly important when the sample is small. Our cloze test with 100 strings clearly falls into this category. It may also be good to measure the statistical power of test sets used in automatic evaluation, parsing and MWE tagging in our paper.

In case of dependency relation prediction and sequence tagging, it may not be straight forward to apply standard formulae as the predictions for each item in a sequence are not independent.

More reading: