singnet / language-learning

OpenCog Unsupervised Language Learning
https://wiki.opencog.org/w/Language_learning
MIT License
32 stars 11 forks source link

More precise PA and F1? #200

Closed akolonin closed 5 years ago

akolonin commented 5 years ago

Here is the current definition (quote from our AGI-2019 paper): "The first perspective is extent to which reference corpus is parsed at all – it is called “parse-ability” (PA) and it computes the average percentage of words in a sen - tence recognized by grammar tester: PA = (Σ(ki/ni))/N, where: PA – parse ability, N – total number of sentences, ki – number of words in i-th sentence recognized by the grammar tester, ni – total number of words in i-th sentence. For the second metric we use conventionally defined F-measure or F-score (F1) metric computed on basis of recall and precision, averaged across all sentences in the corpus as Recall = (Σ(ci/ei))/N and Precision = (Σ(ci/li))/N, where ci – number of cor- rectly identified links in i-th sentence, ei – number of expected links and li – number of identified links, including false positives. That is, for recall we take average per- sentence number of overlapping links in test and reference parses divided by the total number of links in reference parses. Respectively, for precision we take the overlap - ping number divided by the total number of links in test parses."

The problem is that if we have two sentences of 100 and 10 words/links with matches 90 and 1, the assessment will be average = (90/100 + 1/10)/2 = 0.5 - without of account to sentence length. However if we consider individual word/links or average with account to umber of them in the sentence the assessment would be 91/110 = 0.83 which is more "fair".

Here alternative is discussed: https://docs.google.com/document/d/1YtN0-hvGWHJy1_KzXSfGE8w_m3kU5m0LcMmOw4KHT3Q/edit#heading=h.twoiv52o0tou see appendices H and J in the bottom.

We should decide if we want to move to this metric for parses evaluation and when to do that if we decide so.

This issue extends #198

OlegBaskov commented 5 years ago

"Alternative" F1 estimation -- Alternative_F1_for_ALE_ILE%20clustering_2019-04-12.html

glicerico commented 5 years ago

Writing down what I said in previous video calls: I don't think this is a good idea for two reasons:

Because everybody seemed to agree to this arguments during the call, I am closing this issue