Closed GoogleCodeExporter closed 8 years ago
Same here.. Can you plz let us know if this is possible to be done from the
api?
cheers and congrats for the excelence work!
D.
Original comment by Dimitris...@gmail.com
on 23 Nov 2011 at 11:18
I searched litle bit more and i found the solution:
1)if your input is a string
private final BoilerpipeExtractor extractor =
CommonExtractors.DEFAULT_EXTRACTOR;
private final HTMLHighlighter hh = HTMLHighlighter.newExtractingInstance();
InputSource is = new InputSource(new StringReader(detailPageSourceCode));
final TextDocument doc = new BoilerpipeSAXInput(is).getTextDocument();
extractor.process(doc);
StringBuilder bf = new StringBuilder();
bf.append("<meta http-equiv=\"Content-Type\" content=\"text-html;
charset=utf-8\" />");
bf.append(hh.process(doc, detailPageSourceCode));
2)if your input is a URL(taken from HTmlHighlighterDemo.java)
URL url = new URL(
"http://research.microsoft.com/en-us/um/people/ryenw/hcir2010/challenge.html"
// "http://boilerpipe-web.appspot.com/"
);
// choose from a set of useful BoilerpipeExtractors...
final BoilerpipeExtractor extractor = CommonExtractors.ARTICLE_EXTRACTOR;
// choose the operation mode (i.e., highlighting or extraction)
final HTMLHighlighter hh = HTMLHighlighter.newExtractingInstance();
PrintWriter out = new PrintWriter("/tmp/highlighted.html", "UTF-8");
out.println("<base href=\"" + url + "\" >");
out.println("<meta http-equiv=\"Content-Type\" content=\"text-html; charset=utf-8\" />");
out.println(hh.process(url, extractor));
out.close();
Cheers
Original comment by Dimitris...@gmail.com
on 24 Nov 2011 at 11:01
That's the correct solution (= HTMLHighlighterDemo.java).
Original comment by ckkohl79
on 24 Nov 2011 at 5:44
Can anyone tell me how to output JSON
Original comment by waelmiladi
on 26 Sep 2012 at 9:05
Original issue reported on code.google.com by
gyorgy.c...@gmail.com
on 20 Nov 2011 at 3:47