xjl219 / boilerpipe

Automatically exported from code.google.com/p/boilerpipe
0 stars 0 forks source link

Can you fix or promote the bug fix of NekoHTML (#2909310) ? #8

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1. DefaultExtractor.getText(text);
2.
3.

What is the expected output? What do you see instead?
Caused by: de.l3s.boilerpipe.BoilerpipeProcessingException: 
org.xml.sax.SAXException: SAX input contains nested A elements -- You have 
probably hit a bug in NekoHTML (#2909310). Please clean the HTML externally and 
feed it to boilerpipe again
    at de.l3s.boilerpipe.sax.BoilerpipeSAXInput.getTextDocument(BoilerpipeSAXInput.java:54)
    at de.l3s.boilerpipe.extractors.ExtractorBase.getText(ExtractorBase.java:72)
    at de.l3s.boilerpipe.extractors.ExtractorBase.getText(ExtractorBase.java:125)

What version of the product are you using? On what operating system?
1.0.3  Ubuntu,

Please provide any additional information below.

Original issue reported on code.google.com by junli...@gmail.com on 8 Sep 2010 at 8:50

GoogleCodeExporter commented 9 years ago
We should fix this upstream in NekoHTML.

Please add comments about NekoHTML bugs on the NekoHTML issue page to raise 
attention.
http://sourceforge.net/tracker/?func=detail&aid=2909310&group_id=195122&atid=952
178

Cheers,
Christian

Original comment by ckkohl79 on 1 Oct 2010 at 9:43

GoogleCodeExporter commented 9 years ago
Since the bug is in NekoHTML, and a patch is present in boilerpipe, I see no 
need to take further action.

Original comment by ckkohl79 on 2 Nov 2010 at 3:39