lingpy / pybor

A Python library for borrowing detection based on lexical language models
Apache License 2.0
3 stars 1 forks source link

Added and tested accuracy #11

Closed fractaldragonflies closed 4 years ago

fractaldragonflies commented 4 years ago

Added accuracy to the prf() of evaluate.py Tested with pytest. Added specific check for accuracy in test_evaluate to be sure of accuracy. Realize that in coding both the calculation and the test function, I am not capable of truly testing the correctness of the function!

fractaldragonflies commented 4 years ago

I did not add the calculation of majority_accuracy. i.e., the accuracy obtained by just selecting the majority proportion as the default decision. This is a minimal requirement for any category type of detection, in the if we can't do better than always select the majority category, our method isn't very helpful. Unless of course false positives are valued very differently than false negatives... or other such cost calculation.

That said, it would be simple to add it, if @tresoldi and @LinguList thought useful. I have included in the draft paper as baseline.

LinguList commented 4 years ago

I don't understand what "majority accuracy" is, I am afraid. Does it mean, I pass a vector with 0s as judgment and check the scores? This is useful, but not part of the evaluation, I think, we can compute it within the examples, as you can do this for all measures, so it is rather a classifiation method, that can be used as baseline.

fractaldragonflies commented 4 years ago

I've tried to use the evaluate.false_positives function, but error with:

   for (idxA, wordA, judgmentA), (idxB, wordB, judgmentB) in zip(

TypeError: cannot unpack non-iterable numpy.bool_ object

Function seems to want not vectors of test and gold, but arrays of idx, tokens, and binary state. I guess this the intended requirement for input since the code expects it. It belies the idea of doing things simply especially since the rest of the data (id and token) is ignored.

So should I plan on this input requirement, or will we be able to use simple vectors for test result and gold result in a future version?

LinguList commented 4 years ago

I think, since we ARE doing an exact comparison of the data, and since predict_data is supposed to return IDs as well, this is exactly what we need. Furthermore, note that we test if the ID is the same in gold and test, otherwise it throws an error. I had cases in the past, where colleagues messed up test and gold in their code and only later realized this, after a phase of enthusiasm, in which they thought they had found the perfect algorithm.