k-bx / boilerpipe

Automatically exported from code.google.com/p/boilerpipe
29 stars 3 forks source link

hybrid extractor? #48

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
Christian,

We have a corpus that is a mixture of news articles and other web pages, some 
of which contain tables.  The ArticleExtractor has trouble with many of these 
other pages.  Is there a hybrid extractor that detects when it would be better 
to run KeepEverythingExtractor and when better to run ArticleExtractor?

Perhaps we should just use KeepEverything for now...?

Thanks!
jrf

Original issue reported on code.google.com by j...@mit.edu on 27 Apr 2012 at 3:08