petewarden / dstk

A collection of the best open data sets and open-source tools for data science
http://www.datasciencetoolkit.org/
1.12k stars 186 forks source link

Error running Boilerpipe #48

Closed peteyreplies closed 10 years ago

peteyreplies commented 10 years ago

Have setup my own server from Vagrant, trying to use /html2story, and it's throwing an error:

petey$ curl -d "<html><head><title>MyTitle</title></head><body><scrit type="text/javascript">something();</script><div>Some actual text</div></body></html>" "http://myserver.org:8080/html2story"
<?xml version="1.0" encoding="utf-8"?><error>Error running Boilerpipe</error>

/html2text (and everything else I've tried) works fine (Pete, when we emailed re: this, used the wrong endpoint, i.e. this one):

petey$ curl -d "<html><head><title>MyTitle</title></head><body><script type="text/javascript">something();</script><div>Some actual text</div></body></html>" "http://myserver.org:8080/html2story"
{
  "text": "Some ctual text\n"}

I don't have a ton of RAM or CPU in this machine; could it be running into some kind of error ala TwoFishes? Searched 'boilerplate' in issues and didn't find anything...I'm assuming it's the inadequacy of my hardware, but figured I would report so it could at least be a known issue.

petewarden commented 10 years ago

The boilerpipe support is actually provided by running a Java command-line process from Ruby: https://github.com/petewarden/dstk/blob/master/dstk_server.rb#L681

If you try that command-line yourself from a terminal (with the variables filled in) hopefully there will be a more meaningful error! It may show up in /var/log/apache2/error.log too, if you call the endpoint. Thanks for the report, let me know if you get a chance to get more info on the error!

peteyreplies commented 10 years ago

This problem resolved itself once a) I added more RAM and b) TwoFishes finished loading, so I'm guessing it was the Java bugging out w/ no RAM.