ststeiger / wikimodel

Automatically exported from code.google.com/p/wikimodel
0 stars 0 forks source link

MediaWikiReferenceParser#generateImageParams() raises java.lang.IndexOutOfBoundsException #184

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
*What steps will reproduce the problem?
1. Process some page that contains a malformed image markup in the fashion 
explained below

*What is the expected output? What do you see instead?
In the generateImageParams() method, it checks if some parameter contains a '=' 
character. If it's 
the case, it attempts to split it with that same '=' separator. The 
implementation of String#split() 
specifies that trailing white spaces found after the separator will not be 
placed in the resulting 
array. Therefore, if the image parameter is malformed and nothing or only white 
spaces is 
following the '=' symbol, the resulting array contains only one element.
And when the method attempts to access the second one, it raises the above 
mentioned 
exception.
You can find a patch file as an attachment that solves this issue by checking 
the length of the 
resulting array, performing the normal treatment if equals 2 and providing an 
empty string as a 
second argument otherwise.

*What version of the product are you using? On what operating system?

We use the latest version checked out from the SVN repository. We use it on MAC 
OSX Snow 
Leopard with the 1.6 JVM.

*Please provide any additional information below.

The input data is an extract of the french Wikipedia export collected through 
MWDumper. We use 
the WEM component of your project in order to generate a CAS data structure to 
be supplied to 
the Apache UIMA framework.

Original issue reported on code.google.com by Maxime.B...@gmail.com on 4 Jun 2010 at 3:16

Attachments:

GoogleCodeExporter commented 9 years ago
Fixed. See commit r481.

Original comment by mki...@portolancs.com on 23 Aug 2010 at 7:49