jbjorne / TEES

Turku Event Extraction System
147 stars 44 forks source link

confidence level in NER classify output #15

Closed jtsui closed 9 years ago

jtsui commented 9 years ago

I am noticing some incorrectly tagged genes/proteins using classify.py on a few different PubMed articles when testing on several different models.

Is it possible to display the confidence level from BANNER in the output? I am looking at the file OUTSTEM-preprocessed.xml.gz

jbjorne commented 9 years ago

I'm not sure how to get confidence scores from BANNER, but the whole wrapper code is in TEES/Tools/BANNER.py so maybe that can be modified to get such values. If you run the BANNER.py wrapper with the "--debug" switch it will preserve the temporary BANNER input/output files, so you can see exactly what goes in and comes out of BANNER. I think the temporary files go somewhere under your /tmp directory, but when you run it the wrapper should tell you where.