Open GoogleCodeExporter opened 9 years ago
..and I'm using boilerpipe 1.2.0
Original comment by mihaly.k...@gmail.com
on 31 Jul 2012 at 3:39
Hello, did you manage to solve it on your own?
Original comment by tsz...@gmail.com
on 10 Sep 2012 at 4:08
Hello, not really. I use php to analyze the output of boilerpipe, and estimate
the charset, but the ideal case would be if I wouldn't have to do that.
I found a shell wrapper for boilerpipe though which seemed to work:
https://github.com/theneubeck/boilerpipe-server
It didn't fit my needs so I decided to use a php middle layer, but some might
find it helpful.
Original comment by mihaly.k...@gmail.com
on 10 Sep 2012 at 6:36
Found the solution:
Here is the java code needed to fix the special charaters issue:
public class ExtractMe {
public static void main(final String[] args) throws Exception {
BufferedReader in = new BufferedReader(new
InputStreamReader(System.in,"UTF-8"));
PrintStream out = new PrintStream(System.out, true, "UTF-8");
out.println(ArticleExtractor.INSTANCE.getText(in));
}
}
Original comment by mihaly.k...@gmail.com
on 18 Sep 2013 at 1:20
Original issue reported on code.google.com by
mihaly.k...@gmail.com
on 31 Jul 2012 at 3:38