jbjorne / TEES

Turku Event Extraction System
147 stars 42 forks source link

NER step is excluding everything but proteins; why is that? #9

Closed ajmazurie closed 8 years ago

ajmazurie commented 11 years ago

When I looked at the output of the BANNER step I realized only Protein entities were annotated, excluding everything else. Switching to Cocoa didn't change a thing despite the fact I know Cocoa annotates much more than proteins.

Looking at the code (Tools/BANNER.py) I realize that only proteins are even considered. I was wondering why? I am interested in including all the annotations from BANNER (actually, from Cocoa) and am wondering if something will break downstream.

Best, Aurélien

jbjorne commented 11 years ago

Dear Aurélien,

BANNER does not assign type to the entities it detects. The scope of its output is somewhat comparable to the Protein-entity type of the GENIA task, so that's why we label it's output as Protein-entities.

If you want to use other methods (such as Cocoa) for detecting entities of different types, this should be fine. Depending on the task, TEES will take into account entity types corresponding to the given entities (shared task a1 files) of that task. The presence of entities not used in the task model may reduce performance, but should not otherwise affect the system.

Regards, Jari

23.5.2013 20:06, Aurélien Mazurie, Ph.D. kirjoitti:

When I looked at the output of the BANNER step I realized only Protein entities were annotated, excluding everything else. Switching to Cocoa didn't change a thing despite the fact I know Cocoa annotates much more than proteins.

Looking at the code (Tools/BANNER.py) I realize that only proteins are even considered. I was wondering why? I am interested in including all the annotations from BANNER (actually, from Cocoa) and am wondering if something will break downstream.

Best, Aurélien

— Reply to this email directly or view it on GitHub https://github.com/jbjorne/TEES/issues/9.

ajmazurie commented 11 years ago

Thanks for the information. I modified the Tools/Cocoa.py to add the information of the entity type, and apparently TEES didn't object to this. I am currently figuring out which output file(s) contains the results I want, and how to visualize it with BRAT.