matanox / boilerpipe

Automatically exported from code.google.com/p/boilerpipe
0 stars 0 forks source link

Difference WebApi - Api #63

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
I am using Boilerpipe for both web-api and api . For example on the site 
http://www.davidicke.com/forum/showthread.php?page=2&t=72909 , Boilerpipe 
WebAPI working properly while the boilerpipe api return the error 
"java.io.IOException: Server returned HTTP response code: 403 for URL: 
http://boilerpipe-web.appspot.com/extract?url=http://www.davidicke.com/forum/sho
wthread.php?page%3D2%26t%3D72909&extractor=KeepEverythingExtractor&output=htmlFr
agment"
Help me! I do not use any proxy

Original issue reported on code.google.com by lopiccol...@gmail.com on 28 Mar 2013 at 4:37

GoogleCodeExporter commented 9 years ago
i think the problem is because they do not use an user agent when asking for 
the html, and thus creates an error 403 in some websites, but you can try to 
download the html manually and then send that to the 
ArticleExtractor.INSTANCE.getText(String text) but i am not sure.

Original comment by jorgec...@gmail.com on 17 Aug 2013 at 12:35