its not working for a news site

janih / boilerpipe

Boilerplate Removal and Fulltext Extraction from HTML pages

2 stars 0 forks source link

What steps will reproduce the problem?
1.String content = CommonExtractors.DEFAULT_EXTRACTOR.getText(new 
URL("http://www.nytimes.com/2014/06/06/business/gm-ignition-switch-internal-reca
ll-investigation-report.html?hp"));

2.System.out.println(content);

3.It prints nothing

When I run with the above URL, its not extracting anything. I have tried with 
all the extractor but the result is blank.

I have tried on http://boilerpipe-web.appspot.com/ and there its working fine.

Please advice.

Original issue reported on code.google.com by kunal.s....@gmail.com on 6 Jun 2014 at 9:25

Attachments:

BoilerpipeTextExtraction.java

janih / boilerpipe

its not working for a news site #75