misja / python-boilerpipe

Python interface to Boilerpipe, Boilerplate Removal and Fulltext Extraction from HTML pages
Other
539 stars 143 forks source link

jpype._jclass.java.lang.NoClassDefFoundError: java.lang.NoClassDefFoundError: de/l3s/boilerpipe/sax/ImageExtractor #55

Open liyongrui opened 5 years ago

liyongrui commented 5 years ago

There is no imageExtractor, how to solve it ?

bartmachielsen commented 4 years ago

The problem is that the used Boilerpipe version (that is downloaded in the background) is outdated and misses the image extractor implementation. To fix this you need to clone the boilerpipe project (google hosted version) and compile it into a jar and implement it in the project. I did this myself and posted the new version online: https://github.com/bartmachielsen/python-boilerpipe.