xjl219 / boilerpipe

Automatically exported from code.google.com/p/boilerpipe
0 stars 0 forks source link

Boilepipe fails (but not web api edition) #24

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1. curl --fail -L http://thisrecording.com/the-past | java -jar 
tika-app-0.9.jar -T

What is the expected output? What do you see instead?
at de.l3s.boilerpipe.sax.CommonTagActions$2.start(CommonTagActions.java:108)
    at de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler.startElement(BoilerpipeHTMLContentHandler.java:169)
    at org.apache.tika.parser.html.BoilerpipeContentHandler.startElement(BoilerpipeContentHandler.java:195)
    at org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126)
    at org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126)
    at org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126)
    at org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126)
    at org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:237)
    at org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:279)
    at org.apache.tika.parser.html.HtmlHandler.startElementWithSafeAttributes(HtmlHandler.java:197)
    at org.apache.tika.parser.html.HtmlHandler.startElement(HtmlHandler.java:135)
    at org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126)
    at org.apache.tika.parser.html.XHTMLDowngradeHandler.startElement(XHTMLDowngradeHandler.java:61)
    at org.ccil.cowan.tagsoup.Parser.push(Parser.java:794)
    at org.ccil.cowan.tagsoup.Parser.rectify(Parser.java:1061)
    at org.ccil.cowan.tagsoup.Parser.stagc(Parser.java:1016)
    at org.ccil.cowan.tagsoup.HTMLScanner.scan(HTMLScanner.java:565)
    at org.ccil.cowan.tagsoup.Parser.parse(Parser.java:449)
    at org.apache.tika.parser.html.HtmlParser.parse(HtmlParser.java:198)
    at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:197)
    at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:197)
    at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:135)
    at org.apache.tika.cli.TikaCLI$OutputType.process(TikaCLI.java:107)
    at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:288)
    at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:94)

What version of the product are you using? On what operating system?
1.1.0, Mac Os

Please provide any additional information below.
https://issues.apache.org/jira/browse/TIKA-676

Original issue reported on code.google.com by gabriele%mysimpatico.com@gtempaccount.com on 18 Jun 2011 at 12:52

GoogleCodeExporter commented 9 years ago
Thanks for reporting.
Please check again using the just released version 1.2.0 and report if this 
fixes your bugs.

Original comment by ckkohl79 on 6 Jul 2011 at 2:51

GoogleCodeExporter commented 9 years ago
1.2.0 fixes the issue. Is it going to be published in Maven? 
Thanks!

Original comment by m...@jazztique.org on 22 Aug 2011 at 10:44

GoogleCodeExporter commented 9 years ago
1.2.0 indeed still needs to somehow be pushed to Maven central.
In the meantime, use http://boilerpipe.googlecode.com/svn/repo/

Original comment by ckkohl79 on 22 Aug 2011 at 4:12

GoogleCodeExporter commented 9 years ago
Hi,
Is there any date we can expect it to be pushed to Central? Tika will not ship 
with external repos.
Thanks.

Original comment by m...@jazztique.org on 11 Oct 2011 at 12:57