janih / boilerpipe

Boilerplate Removal and Fulltext Extraction from HTML pages
2 stars 0 forks source link

Different result when using Web Api and the source api? #83

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
The result of a same page is different with the web api. For example consider 
the following link:
http://boilerpipe-web.appspot.com/extract?url=http%3A%2F%2F1tajrobeh.blog.ir%2F&
extractor=ArticleExtractor&output=html&extractImages=

I used ArticleExtractor in version 1.2.0 but the result is different. One post 
in the end of the page has not been detected as content page. Please take a 
look at attached html.

What's the difference between the web api and the provided source demo?

Original issue reported on code.google.com by jadidine...@gmail.com on 24 Jan 2015 at 1:03

Attachments: