zhujiangang / wikixmlj

Automatically exported from code.google.com/p/wikixmlj
0 stars 0 forks source link

java.lang.ArrayIndexOutOfBoundsException: 0 #4

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1. Index enwiki-20081008-pages-articles.xml.bz2
2. Count down every page parsed
3. At page 210200 or so the parser throws the following exception:

java.lang.ArrayIndexOutOfBoundsException: 0
    at edu.jhu.nlp.wikipedia.WikiTextParser.parseLinks(WikiTextParser.java:71)
    at edu.jhu.nlp.wikipedia.WikiTextParser.getLinks(WikiTextParser.java:50)
    at edu.jhu.nlp.wikipedia.WikiPage.getLinks(WikiPage.java:104)
    at Test$1.process(Test.java:78)
    at
edu.jhu.nlp.wikipedia.SAXPageCallbackHandler.endElement(SAXPageCallbackHandler.j
ava:42)
    at
com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.endElement(Abstract
SAXParser.java:604)
    at
com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanEndEl
ement(XMLDocumentFragmentScannerImpl.java:1750)
    at
com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentC
ontentDriver.next(XMLDocumentFragmentScannerImpl.java:2906)
    at
com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentS
cannerImpl.java:624)
    at
com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next(XMLNSDocum
entScannerImpl.java:116)
    at
com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocum
ent(XMLDocumentFragmentScannerImpl.java:486)
    at
com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configu
ration.java:810)
    at
com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configu
ration.java:740)
    at
com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:110)
    at
com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXPa
rser.java:1208)
    at edu.jhu.nlp.wikipedia.WikiXMLSAXParser.parse(WikiXMLSAXParser.java:47)
    at Test.main(Test.java:103)

Please provide any additional information below.

Original issue reported on code.google.com by felipehu...@gmail.com on 5 Aug 2009 at 9:08

GoogleCodeExporter commented 9 years ago
The happens due to a malformed wiki text. This case should be handled.

Original comment by delip...@gmail.com on 5 Aug 2009 at 9:33

GoogleCodeExporter commented 9 years ago
r35 should fix this. Download the latest jar or update the svn view.

Original comment by delip...@gmail.com on 5 Aug 2009 at 9:52