misja / python-boilerpipe

Python interface to Boilerpipe, Boilerplate Removal and Fulltext Extraction from HTML pages
Other
539 stars 143 forks source link

extractor.getImage raises an exception #14

Closed Cutuchiqueno closed 11 years ago

Cutuchiqueno commented 11 years ago

Arch Linux, OpenJDK7, Python 2.7.5

(Is it in issue just of OpenJDK7???)

extractor = Extractor(extractor='ArticleExtractor', url='http://www.faz.net/aktuell/wissen/physik-chemie/digitale-vernetzung-die-masse-macht-s-11916683.html') html = extractor.getHTML() images = extractor.getImages()

java.lang.ExceptionPyRaisable Traceback (most recent call last)

in () ----> 1 images = extractor.getImages() /usr/lib/python2.7/site-packages/boilerpipe/extract/**init**.pyc in getImages(self) 72 def getImages(self): 73 extractor = jpype.JClass( ---> 74 "de.l3s.boilerpipe.sax.ImageExtractor").INSTANCE 75 images = extractor.process(self.source, self.data) 76 jpype.java.util.Collections.sort(images) /usr/lib/python2.7/site-packages/jpype/_jclass.pyc in JClass(name) 51 jc = _jpype.findClass(name) 52 if jc is None : ---> 53 raise _RUNTIMEEXCEPTION.PYEXC("Class %s not found" % name) 54 55 return _getClassFor(jc) java.lang.ExceptionPyRaisable: java.lang.Exception: Class de.l3s.boilerpipe.sax.ImageExtractor not found
Cutuchiqueno commented 11 years ago

sorry, I saw #5 too late