savkov / bratutils

A collection of utilities for manipulating data and calculating inter-annotator agreement in brat annotation files.
MIT License
29 stars 12 forks source link

Fix statistics calculation for cases with contained annotations #2

Closed HugoSousa closed 8 years ago

HugoSousa commented 8 years ago

As discussed in #1, the statistics are not properly calculated when pointing to the same file.

After analysing the source code and the test samples, I figured out that an annotation is compared with all the annotations included on it. This may not be the right way to it. Let's see the following case:

gold document

T1  Concept 0 16    something inside
T2  Concept 10 16   inside

other document

T1  Concept 0 16    something inside
T2  Concept 10 16   inside

As the T2 is included in T1, the comparison for T1 would be done for T1 and T2, which results in a correct and a partially correct annotation. In fact, this would result in 3 total comparisons, instead of 2.

So, I first remove the duplicates (so T1 wouldn't compare twice if there is an equal annotation). Then, check for equal annotations. If there isn't an equal annotation, then it checks for contained annotations for comparison.

I also uploaded a bunch of samples, which seem to be producing the expected results for me.

Here's a small script to run the samples

from bratutils import agreement as a

doc = a.Document('res/samples/A/data-sample-1.ann')
doc2 = a.Document('res/samples/A/data-sample-1.ann')

#doc = a.Document('res/samples/other/duplicates_equals.ann')
#doc2 = a.Document('res/samples/other/duplicates_equals_b.ann')

#doc = a.Document('res/samples/other/incorrect.ann')
#doc2 = a.Document('res/samples/other/incorrect_b.ann')

#doc = a.Document('res/samples/other/incorrect_duplicate_mix.ann')
#doc2 = a.Document('res/samples/other/incorrect_duplicate_mix_b.ann')

#doc = a.Document('res/samples/other/no_overlaps_equals.ann')
#doc2 = a.Document('res/samples/other/no_overlaps_equals_b.ann')

#doc = a.Document('res/samples/other/partial.ann')
#doc2 = a.Document('res/samples/other/partial_b.ann')

#doc = a.Document('res/samples/other/partial_different.ann')
#doc2 = a.Document('res/samples/other/partial_different_b.ann')

doc.make_gold()
statistics = doc2.compare_to_gold(doc)

print statistics

Please, analyse the changes and check if this really is the desired output. Thanks.

savkov commented 8 years ago

Hi Hugo,

what you're describing makes sense. I'll have a look at it tonight.

Thanks a lot for contributing!

savkov commented 8 years ago

Hi,

Does this work for you? I get 1.0 of everything for documents with large differences (data-sample-1.ann).

I don't have time at the moment to properly debug this/give you feedback, but the removal of duplicates seems unnecessary. At that point there are no duplicates because brat does not allow them. Also, there are no magic methods (__eq__) in place to define this properly.

My feeling is that this process is made far too complicated and I should take a radically different approach. Use tables for indices, or something along those lines. The current process seems too complicated to be understood, thus makes me pull my hair.

I will have some time while travelling over the weekend, I'll get back to you then.

HugoSousa commented 8 years ago

Comparing data-sample-1.ann with what? Comparing it with itself produces the following output, which seems to be correct to me, as the file has 130 annotations:

-------------------MUC-Table--------------------
------------------------------------------------
pos:130
act:130
cor:130
par:0
inc:0
mis:0
spu:0
------------------------------------------------
pre:1.0
rec:1.0
fsc:1.0
------------------------------------------------
und:0.0
ovg:0.0
sub:0.0
------------------------------------------------
bor:130
ibo:0
------------------------------------------------

------------------------------------------------

About the duplicates

If you test the following annotation (file duplicates_equals.ann) with itself

T1  Concept 0 16    something inside
T2  Concept 0 16    something inside
T3  Concept 10 16   inside

you get the following outputs

Output without removing the duplicates (T1 is comparing with T1 and T2, T2 is comparing with T1 and T2, and T2 with itself, resulting in 5 comparisons.)

-------------------MUC-Table--------------------
------------------------------------------------
pos:5
act:5
cor:5
par:0
inc:0
mis:0
spu:0
------------------------------------------------
pre:1.0
rec:1.0
fsc:1.0
------------------------------------------------
und:0.0
ovg:0.0
sub:0.0
------------------------------------------------
bor:5
ibo:0
------------------------------------------------

------------------------------------------------

Output removing the duplicates (T1 is comparing with itself and T2 is comparing with itself.)

-------------------MUC-Table--------------------
------------------------------------------------
pos:2
act:2
cor:2
par:0
inc:0
mis:0
spu:0
------------------------------------------------
pre:1.0
rec:1.0
fsc:1.0
------------------------------------------------
und:0.0
ovg:0.0
sub:0.0
------------------------------------------------
bor:2
ibo:0
------------------------------------------------

------------------------------------------------

I guess the result after removing the outputs is more correct, as in fact there are only 2 annotations, if we don't consider the duplicates, and not 5. Do you agree?

I don't know if the brat standoff format doesn't allow duplicates, but the tool I am using to annotate my documents does allow it, so I guess preventing this situation would be useful anyway.

savkov commented 8 years ago

Disagreement

OK, so the A/data-sample-1.ann and B/data-sample-1.ann are the same so it should output 1.0 agreement. If you try A/data-sample-2.ann and B/data-sample-2.ann you should get some differences.

doc = a.Document('../res/samples/A/data-sample-2.ann')
doc2 = a.Document('../res/samples/B/data-sample-2.ann')

doc.make_gold()
statistics = doc2.compare_to_gold(doc)

print statistics
output:
-------------------MUC-Table--------------------
------------------------------------------------
pos:138
act:138
cor:138
par:0
inc:0
mis:0
spu:0
------------------------------------------------
pre:1.0
rec:1.0
fsc:1.0
------------------------------------------------
und:0.0
ovg:0.0
sub:0.0
------------------------------------------------
bor:138
ibo:0
------------------------------------------------
------------------------------------------------

I'll leave some comments in the source.

Repetitions

I didn't realise there are other tools using the format. I agree it makes sense to handle them then.

HugoSousa commented 8 years ago

These two files are identical actually . I guess you maybe uploaded a wrong file or something? Thus, the output seems correct, with 138 correct annotations.

savkov commented 8 years ago

Ahg damn it, ignore my comment about disagreement.

savkov commented 8 years ago

Oh, you already noticed. :)

savkov commented 8 years ago

I am merging the request even though I rewrote large parts of the code to simplify the enormous flow control statement. There is, at least, one issue with the logic of the code at hand, but you honed in on bugs I had created when refactoring to make the code prettier (talking about the separation of same annotations and coinciding annotations; I checked the old repo), and the duplicates issue that I had largely ignored. I will add more documentation and tests as well. Thanks for contributing, I hope you find the library useful.