laito / cleartk

Automatically exported from code.google.com/p/cleartk
0 stars 0 forks source link

Evaluator for named entity chunker example #336

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
I have created an Evaluator class for the named entity chunking example 
(org.cleartk.examples.chunking) by following the code in 
org.cleartk.examples.documentclassification.advanced.DocumentClassificationEvalu
ation. The attached patch has this class. This class expects the test data (in 
MASC format) to be present in src/main/resources/data/MASC-1.0.3/data/test. I 
moved some files from MASC-1.0.3/data/written to MASC-1.0.3/data/test and tried 
creating the patch but wasn't ably to apply the patch cleanly. So, I created 
the patch with just the Evaluator class and the directory structure remains 
same. The user will have to move some files from MASC-1.0.3/data/written 
directory to MASC-1.0.3/data/test directory for the evaluation class to work as 
is. It would be nice if any of the committers could do this or let me know if 
there is a way through which I can make this file movement a part of the patch.

The evaluator class (org.cleartk.examples.chunking.EvaluateNamedEntityChunker) 
can be used for either 2-fold cross-validation, hold out evaluation, or just 
for testing a pre-trained model. The evaluation is done by running the 
MASCGoldAnnotator on the test data to fill in gold named entity mentions in the 
Jcas, storing these annotations in a list, removing them from the Jcas, running 
the NamedEntityChunker on the same Jcas to fill in system identified named 
entity mentions in the Jcas, and then adding both gold and system identified 
named entities to the AnnotationStatistics object for evaluation.

Let me know if things does not look good in the patch.

Thanks,
-Himanshu

Original issue reported on code.google.com by himanshu...@gmail.com on 9 Oct 2012 at 10:20

Attachments:

GoogleCodeExporter commented 9 years ago
I have submitted the Individual Contributor's License and wanted to know if 
this patch looks good to commit.

Original comment by himanshu...@gmail.com on 24 Oct 2012 at 11:12

GoogleCodeExporter commented 9 years ago
This issue was closed by revision 8b8fa7b6a334.

Original comment by steven.b...@gmail.com on 31 Oct 2012 at 1:42

GoogleCodeExporter commented 9 years ago
Thanks again for your patch, and sorry for the delay in applying it!

I mostly applied it as-is, but I made a few changes in test() in particular 
that might be of interest. We recommend using different views for the "gold" 
and "system" annotations. Take a look at 
org.cleartk.examples.chunking.EvaluateNamedEntityChunker.test if you're 
interested.

Original comment by steven.b...@gmail.com on 31 Oct 2012 at 1:44

GoogleCodeExporter commented 9 years ago
Thanks Steve! I checked the code and it looks better now.

Original comment by himanshu...@gmail.com on 2 Nov 2012 at 5:00