Closed GoogleCodeExporter closed 9 years ago
The CanolaExtractor has been trained on Canola documents. Its main purpose is
to demonstrate the competitiveness of such a simple classifier (based on number
of words/densities) for the Canola corpus evaluation. I would not recommend it
for other purposes.
I'd recommend using ArticleExtractor for any type of news articles and
DefaultExtractor (or maybe LargestContentExtractor) for the rest. YMMV.
I have provided some [Benchmarks
http://code.google.com/p/boilerpipe/wiki/Benchmarks] on the L3S-GN1 news
corpus, as an initial starting point.
Original comment by ckkohl79
on 23 Feb 2011 at 8:09
Original comment by ckkohl79
on 23 Feb 2011 at 8:10
Original comment by ckkohl79
on 6 Jul 2011 at 2:53
Original issue reported on code.google.com by
tur...@gmail.com
on 21 Feb 2011 at 8:13