jwijffels / udpipe-spacy-comparison

Compare accuracies of udpipe models and spacy models which can be used for NLP annotation
Mozilla Public License 2.0
14 stars 1 forks source link

What is the `AligndAcc` score? #2

Closed arademaker closed 6 years ago

arademaker commented 6 years ago

I didn't find the definition, where is it defined?

jwijffels commented 6 years ago

It's defined here: http://universaldependencies.org/conll17/evaluation.html It basically aligns the words from the prediction to the ones from the know 'gold' test dataset. Once they are aligned, accuracy metrics are computed.

arademaker commented 6 years ago

Sorry, this find this information in the page, can you be more specific? ;-)

jwijffels commented 6 years ago

The evaluation script was taken from here: https://github.com/ufal/conll2017/tree/master/evaluation_script and is put also in the evaluation_script folder in this repository (https://github.com/jwijffels/udpipe-spacy-comparison/blob/master/evaluation_script/conll17_ud_eval.py).

It computes Precision, Recall, F1 score and Accuracy = AligndAcc for the different parts of the annotation namely:

To make this concrete, below is shown the result of an annotation (head_token_id in R output shown below is called the syntactic head, dep_rel is the dependency label)

We are basically comparing 2 files,

In order to compare these 2 files and because the outputted sequence of tokens from the model might be different than the sequence of tokens in the gold file (real-human annotated results), you basically need to align the tokens in the 2 files. T That alignment means we look within each sentence if we have matching tokens. These are put next to each other along with the predicted upos/xpos/feats/dependency head and dependency relation. Based on that file the precision, recall, F1 and accuracy score can be computed. This alignment is explained at http://universaldependencies.org/conll17/evaluation.html. AligndAcc just means based on the aligned data, how many percent of the outputted values from the model are the same as the human annotated holdout test data. Precision P is the number of correct values divided by the number of system-produced values. Recall R is the number of correct values divided by the number of gold-standard values . F1 score = 2PR / (P+R)

If this is still unclear, the authors of the evaluation script can be found at https://github.com/ufal/conll2017

dl <- udpipe_download_model(language = "english")
udmodel_en <- udpipe_load_model(file = "english-ud-2.0-170801.udpipe")

x <- udpipe_annotate(udmodel_en, 
                     x = "the economy is weak but the outloook is bright")

> as.data.frame(x)
  doc_id paragraph_id sentence_id                                       sentence token_id    token    lemma  upos xpos
1   doc1            1           1 the economy is weak but the outloook is bright        1      the      the   DET   DT
2   doc1            1           1 the economy is weak but the outloook is bright        2  economy  economy  NOUN   NN
3   doc1            1           1 the economy is weak but the outloook is bright        3       is       be   AUX  VBZ
4   doc1            1           1 the economy is weak but the outloook is bright        4     weak     weak   ADJ   JJ
5   doc1            1           1 the economy is weak but the outloook is bright        5      but      but CCONJ   CC
6   doc1            1           1 the economy is weak but the outloook is bright        6      the      the   DET   DT
7   doc1            1           1 the economy is weak but the outloook is bright        7 outloook outloook  NOUN   NN
8   doc1            1           1 the economy is weak but the outloook is bright        8       is       be   AUX  VBZ
9   doc1            1           1 the economy is weak but the outloook is bright        9   bright   bright   ADJ   JJ
                                                  feats head_token_id dep_rel deps            misc
1                             Definite=Def|PronType=Art             2     det <NA>            <NA>
2                                           Number=Sing             4   nsubj <NA>            <NA>
3 Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin             4     cop <NA>            <NA>
4                                            Degree=Pos             0    root <NA>            <NA>
5                                                  <NA>             9      cc <NA>            <NA>
6                             Definite=Def|PronType=Art             7     det <NA>            <NA>
7                                           Number=Sing             9   nsubj <NA>            <NA>
8 Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin             9     cop <NA>            <NA>
9                                            Degree=Pos             4    conj <NA> SpacesAfter=\\n