Closed GoogleCodeExporter closed 9 years ago
A big +1 for some kind of support for training/testing and cross-validation.
Some comments on your specific approach:
* CorpusFactory has a lot of methods. We should probably have a CorpusFactory
subclass that just requires a few pieces of information, e.g. the names of the
training files, the names of the testing files, and the number of folds the
training files should be split into. That should be enough info to fill in the
bodies of all those methods.
* EngineFactory looks fine.
* EvaluationFactory is okay, I guess, but the interface of having to write
evaluation results to files and then read them back in seems a little clunky.
But I guess there's no real output of a UIMA pipeline except files, so there's
no obvious way to keep those evaluation results in memory. At the very least,
we should supply some default subclasses for the common case of accuracy and
f-score evaluations.
As a side note, the lines in a few places in your code that look like:
if(fold < 10) {
foldDirectory = new File(outputDirectory, "fold0"+fold);
need fixed for folks who use more than 10 folds. Something like:
String format = String.format("fold%%0%dd", Math.ceil(Math.log10(folds)));
foldDirectory = new File(outputDirectory, String.format(format, fold));
Original comment by steven.b...@gmail.com
on 1 Dec 2010 at 9:46
I suppose we could have two CorpusFactory interfaces for which the _ImplBase
would implement both. I'm indifferent about this.
Yeah - the output directory in EvaluationFactory - is not super elegant but I
thinks its the best we can do here.
I'll try out the directory name formatting. looks cool!
Any thoughts about where I should put this evaluation code? I am thinking
either in cleartk-ml or in a separate project cleartk-ml-evaluation with an
initial preference for the former.
Original comment by pvogren@gmail.com
on 2 Dec 2010 at 6:57
I think cleartk-ml is fine. Anyone who is using ClearTK for ML seriously will
want evaluation code.
Original comment by steven.b...@gmail.com
on 3 Dec 2010 at 8:24
The cleartk-eval package provides this functionality.
Original comment by steven.b...@gmail.com
on 12 Feb 2012 at 4:28
Original comment by steven.b...@gmail.com
on 5 Aug 2012 at 8:58
Original issue reported on code.google.com by
pvogren@gmail.com
on 1 Dec 2010 at 6:41