jbjorne / TEES

Turku Event Extraction System
147 stars 44 forks source link

Prevent identifying "protein" entities #30

Closed banana-Z closed 5 years ago

banana-Z commented 5 years ago

Dear author,

I'm extracting the relations between drug and clinical entity (ADE/dosage/frequency/form/duration/route/strength/reason). But the preprocess.py result (.xml) shows the entity type of "Protein" is also identified, shown as follows:

screen shot 2018-10-02 at 3 37 54 pm

How to not identity "Protein" type in text files? So the structure.txt from trained model will not show "ENTITY Protein".

screen shot 2018-10-02 at 3 41 49 pm
jbjorne commented 5 years ago

The BANNER program is run as part of preprocess.py in order to generate the "Protein" entities which many of the models require for event extraction. In order to not have these entities, please run the preprocessing pipeline (https://github.com/jbjorne/TEES/wiki/The-Preprocessor) without the BANNER step.

banana-Z commented 5 years ago

Thanks for your help, it worked in the computing project.