Closed GoogleCodeExporter closed 9 years ago
You can do it very simply.
In your crawler you are using the "visit(Page page)" method
In the page object you have the complete html of the page:
if (page.getParseData() instanceof HtmlParseData) {
HtmlParseData htmlParseData = (HtmlParseData) page.getParseData();
String html = htmlParseData.getHtml();
}
Use that html object to parse it (using jsoup?) and take whatever you need
Original comment by avrah...@gmail.com
on 11 Aug 2014 at 1:36
Not a bug or feature request
Original comment by avrah...@gmail.com
on 11 Aug 2014 at 1:36
Original issue reported on code.google.com by
bsham...@gmail.com
on 2 Apr 2012 at 3:17