janih / boilerpipe

Boilerplate Removal and Fulltext Extraction from HTML pages
2 stars 0 forks source link

its not working for a news site #75

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1.String content = CommonExtractors.DEFAULT_EXTRACTOR.getText(new 
URL("http://www.nytimes.com/2014/06/06/business/gm-ignition-switch-internal-reca
ll-investigation-report.html?hp"));

2.System.out.println(content);

3.It prints nothing

When I run with the above URL, its not extracting anything. I have tried with 
all the extractor but the result is blank.

I have tried on http://boilerpipe-web.appspot.com/ and there its working fine.

Please advice.

Original issue reported on code.google.com by kunal.s....@gmail.com on 6 Jun 2014 at 9:25

Attachments:

GoogleCodeExporter commented 9 years ago
If possible, can you provide me the online demo libraries? It seems that online 
version is more robust that downloaded library.

Original comment by kunal.s....@gmail.com on 6 Jun 2014 at 10:16