janih / boilerpipe

Boilerplate Removal and Fulltext Extraction from HTML pages
2 stars 0 forks source link

BoilerPipe for Android #57

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
i have customized boilerpipe to run on Android this is the jar (see attach)
geppo1988@gmail.com

Original issue reported on code.google.com by geppo1...@gmail.com on 27 Nov 2012 at 8:57

Attachments:

GoogleCodeExporter commented 9 years ago
you still need to add xerces and nekohtml in Build Path
geppo1988@gmail.com

Original comment by geppo1...@gmail.com on 27 Nov 2012 at 9:00

GoogleCodeExporter commented 9 years ago
I was getting the error while Dex Execute saying that the HTMLElement$Element 
is duplicate. When I used your android jar the error has gone. 

I think you have removed the org.cyberneko.html package from the boilerpipe.jar.

Original comment by chandu12...@gmail.com on 3 Dec 2012 at 11:33

GoogleCodeExporter commented 9 years ago
:)! Is all ok? Is working fine? Is there bug? 

Original comment by geppo1...@gmail.com on 4 Dec 2012 at 8:23

GoogleCodeExporter commented 9 years ago
It works!

You are a life saver. I spent an entire day uselessly poking around with build 
paths, installing the boilerpipe source, and alternating between chandu's 
problem and java.lang.NoClassDefFoundError. Don't know why I didn't find this 
sooner. Thank you. 

Original comment by wmarqua...@gmail.com on 17 Dec 2012 at 12:08

GoogleCodeExporter commented 9 years ago
In fact i posted to save others time to search for a solution to the problem, 
of course it was not anything difficult

Original comment by geppo1...@gmail.com on 18 Dec 2012 at 11:36

GoogleCodeExporter commented 9 years ago
Hello. I really need to use this but i still don't know how. I have had no 
issues with the regular version, but i don't know how to use the android 
version.

I am getting the error:
   java.lang.NoClassDefFoundError: de.l3s.boilerpipe.extractors.ArticleExtractor

I have placed the following into my JRE lib folder. Then i added them to the 
project's build path as libraries using "Add external Jar"
   boilerpipe-1.2.0-android.jar
   xerces-2.9.1.jar
   nekohtml-1.9.13.jar

The boilerpipe file is the one you linked. The two dependencies are the ones 
that came with the original jar file that i got working in java. What 
dependencies do i use? Are there android specific ones?

Any help would be greatly appreciated, thanks!

Original comment by m...@issist.com on 13 Feb 2013 at 6:36

GoogleCodeExporter commented 9 years ago
Oh my gosh thank u so much for this file. I was about to give up on my app. 
Everybody don't forget to use this in an async task

Original comment by 96hud...@gmail.com on 18 Jun 2013 at 3:01

GoogleCodeExporter commented 9 years ago
Thanks a lot. I found multiple duplicate files in both the jars but it was 
tedious to find and remove. Your jar file saved lot of work. 

Original comment by meetjas...@gmail.com on 15 Sep 2013 at 11:22

GoogleCodeExporter commented 9 years ago
I really need this to work. Anybody solve the NoClassDefFoundException?
I included nekohtml and xerces as external jars, dropped boilerpipe jar right 
into my libs folder. I run:

URL url;
                    try {
                        url = new URL("someurlhere");
                        String text = ArticleExtractor.INSTANCE.getText(url);
                    } catch (MalformedURLException e) {
                        // TODO Auto-generated catch block
                        e.printStackTrace();
                    } catch (BoilerpipeProcessingException e) {
                        // TODO Auto-generated catch block
                        e.printStackTrace();
                    }

And get he following trace:

06-19 22:52:24.748: E/AndroidRuntime(29595): FATAL EXCEPTION: main
06-19 22:52:24.748: E/AndroidRuntime(29595): java.lang.NoClassDefFoundError: 
de.l3s.boilerpipe.sax.BoilerpipeHTMLParser
06-19 22:52:24.748: E/AndroidRuntime(29595):    at 
de.l3s.boilerpipe.sax.BoilerpipeSAXInput.getTextDocument(BoilerpipeSAXInput.java
:51)
06-19 22:52:24.748: E/AndroidRuntime(29595):    at 
de.l3s.boilerpipe.extractors.ExtractorBase.getText(ExtractorBase.java:69)
06-19 22:52:24.748: E/AndroidRuntime(29595):    at 
de.l3s.boilerpipe.extractors.ExtractorBase.getText(ExtractorBase.java:87)

Original comment by wesbl...@gmail.com on 19 Jun 2014 at 10:58