misja / python-boilerpipe

Python interface to Boilerpipe, Boilerplate Removal and Fulltext Extraction from HTML pages
Other
539 stars 143 forks source link

getImages no longer works #5

Closed md3sum closed 12 years ago

md3sum commented 12 years ago

@misja : The boilerpipe library 1.2.0 no longer has the ImageExtractor class [de.l3s.boilerpipe.sax.ImageExtractor] It might be worth removing this feature or commenting it out for now (even though I know you have not included it in the documentation in the Readme.rst).

If you know of another way to retrieve it using python-boilerpipe, I'd be happy to hear because I would like the images.

Cheers

misja commented 12 years ago

The ImageExtractor class is in the boilerpipe repository, but hasn't made it into a release yet. You could compile a boilerpipe release yourself though and use it in combination with this module. http://code.google.com/p/boilerpipe/source/browse/trunk/boilerpipe-core/src/main/de/l3s/boilerpipe/sax/ImageExtractor.java

md3sum commented 12 years ago

Ah. I see it now. Thanks for that. Sorry for the post. Cheers and thank you for this project of yours.

I have another question about the usage of jpype but we can have that chat away from this bug report.

hnykda commented 9 years ago

Still the same... More than year later...

benpryke commented 9 years ago

Even when using a version of boilerpipe-1.2.0.jar built from the most recent source, and therefore including the ImageExtractor class, I received an error calling getImages(). I forked the repository and fixed the error. A pull request is pending, but for now my fork is here: https://github.com/Ninjakannon/python-boilerpipe.