savkov / bratutils

A collection of utilities for manipulating data and calculating inter-annotator agreement in brat annotation files.
MIT License
29 stars 12 forks source link

Document instance has no attribute 'postag_list' #1

Closed HugoSousa closed 8 years ago

HugoSousa commented 8 years ago

Hello. I need to compare automatic annotations performed by a software application with manual annotations (in brat standoff format), and this seems to be a nice tool to use.

While testing it and trying to understand the source code, I tried the following small sample code

import agreement as a

doc = a.Document("myfile.ann")
doc2 = a.Document("myfile.ann")

doc.make_gold()
statistics = doc2.compare_to_gold(doc)

However, on the execution of compare_to_gold function, it says that Document instance has no attribute 'postag_list', which is true, but I don't understand where this comes from either.

Am I missing something? Could you eventually post a small working example for comparing two .ann files? I'd appreciate that.

Thanks.

savkov commented 8 years ago

Hi,

Thanks for raising this issue. This is just to acknowledge that I'm on it. This is part of my PhD code, that really needs attention. I will get back to you with some examples later on. I suspect part of the code is missing here.

Sasho

savkov commented 8 years ago

Hi,

your example should be working now. I had forgotten to rename the attribute postag_list to tags in some cases. PyCharm's refactoring is sometimes very sneaky. I've also added some examples on how to use the agreement module. Hope that helps. Let me know if there is anything else.

Sasho

HugoSousa commented 8 years ago

Hey,

I appreciate the fast reply. And thanks for fixing and improving the project. As an improvement suggestion, I think it would be nice to have this in the README in order to understand the statistics metrics better.

However, I'm now questioning about the logic of the program. I'm running the sample with doc and doc2 pointing to the same file. So, it should be expected to have precision and recall of 1. Right?

It's not the case, though. The following code:

from bratutils import agreement as a

doc = a.Document('res/samples/A/data-sample-1.ann')
doc2 = a.Document('res/samples/A/data-sample-1.ann')

doc.make_gold()
statistics = doc2.compare_to_gold(doc)

print statistics

Results in the following statistics:

-------------------MUC-Table--------------------
------------------------------------------------
pos:158
act:158
cor:130
par:0
inc:28
mis:0
spu:0
------------------------------------------------
pre:0.822784810127
rec:0.822784810127
fsc:0.822784810127
------------------------------------------------
und:0.0
ovg:0.0
sub:0.177215189873
------------------------------------------------
bor:158
ibo:0
------------------------------------------------

------------------------------------------------

I guess the 28 incorrect counter is not being correctly calculated.

savkov commented 8 years ago

That's the moment I take a real look at my PhD code and I pull my hair. I'll need some time to fix this properly.

HugoSousa commented 8 years ago

Ok, thanks.

I'll also try to give a look at the source code and see if I can help with it.

However, there should be some easy and trustable way to test the results (with smaller samples with manual calculations, I guess).

savkov commented 8 years ago

Yeah, part of it should really be writing tests for it.