savkov / bratutils

A collection of utilities for manipulating data and calculating inter-annotator agreement in brat annotation files.
MIT License
29 stars 12 forks source link

Incorrect value of spurious tags when no overlapping #3

Closed HugoSousa closed 7 years ago

HugoSousa commented 8 years ago

Hello again,

I have two .ann files.

The gold

T1  Medical-Concept 36 41   tumor
T2  Medical-Concept 327 351 síndrome mielodisplásica
T3  Medical-Concept 440 445 tumor
T4  Medical-Concept 22 32   morfologia
T5  Medical-Concept 79 117  Nomenclatura Sistematizada de Medicina
T6  Medical-Concept 120 126 SNOMED
T7  Medical-Concept 189 204 Linfoma maligno
T8  Medical-Concept 207 216 folicular
T9  Medical-Concept 220 227 nodular
T10 Medical-Concept 270 310 Anemia refratária com excesso de blastos
T11 Medical-Concept 356 366 deleção 5q
T12 Medical-Concept 368 371 5q-

And the candidate set

T1  Medical-Concept 270 287 Anemia refratária
T2  Medical-Concept 327 335 Síndrome
T3  Medical-Concept 471 476 seção

For the comparison I'm running the following code

from bratutils import agreement as a

__author__ = 'Aleksandar Savkov'

doc = '3711'
gold = a.Document('../res/ht_gold/' + doc + '.ann')
extension = a.Document('../res/ht_extension/' + doc + '.ann')

gold.make_gold()
statistics = extension.compare_to_gold(gold)

print statistics

This should produce as result: 0 correct, 12 missing and 3 spurious tags. Right?

The produced result is 3 missing tags and 0 correct/partial/spurious. I think the spurious tags are not being correctly handled.

Is my thinking right, or this is actually the desired output?

Hugo

HugoSousa commented 8 years ago

I realize that if the code is changed to

from bratutils import agreement as a

__author__ = 'Aleksandar Savkov'

doc = '796'
gold = a.Document('../res/ht_gold/' + doc + '.ann')
extension = a.Document('../res/ht_extension/' + doc + '.ann')

#gold.make_gold()
statistics = gold.compare_to_gold(extension)

print statistics

it outputs the result I just said.

I just realize now that in your sample you also comment the make_gold function and the order in which the documents appear in the compare_to_gold was confusing to me too, as it is not referred which one is the gold document in the sample (actually in the collections comparison, the gold collection is the function parameter).

So, I guess this is not an issue, just a bad usage from myself.

Can you confirm this is the correct way to execute it?

Thanks, Hugo

savkov commented 7 years ago

Hi, this is long overdue but I think the answer is yes :)

I'm closing this issue. Thanks for your feedback!

Sasho