uniVocity / univocity-parsers

uniVocity-parsers is a suite of extremely fast and reliable parsers for Java. It provides a consistent interface for handling different file formats, and a solid framework for the development of new parsers.
905 stars 249 forks source link

DefaultConversionProcess.applyConversions() throws ArrayIndexOutOfBoundsException... #492

Open randy opened 2 years ago

randy commented 2 years ago

...when fields are missing from the input data.

My CsvParserSettings looks something like this:

// parent class info elided..
// List<String> errors = []
final protected CsvParserSettings configureCsvSettings(CsvParserSettings csvSettings,
                                                       BeanListProcessor beanProcessor) {
    csvSettings.format.lineSeparator = System.lineSeparator()
    csvSettings.headerExtractionEnabled = true
    csvSettings.processor = beanProcessor
    csvSettings.processorErrorHandler = new MyRowProcessorErrorHandler(errors)

    return csvSettings
}

The BeanListProcessor has a trivial override of the beanProcessed() method, but I've commented that out when testing this and determined that it makes no difference in the outcome. I'm otherwise instantiating it using the BeanListProcessor(Class<T>, int) constructor.

I have input data that is missing one or more headers (whole columns). All of the properties in my target bean are annotated with @Parsed or another meta-annotation that uses it. And, as indicated previously, I'm supplying the target class to the processor so headers are being derived from its @Headers annotation (or at least, that's my understanding so far). That annotation has the sequence property defined with all expected header values.

All has been fine for months. Then, recently, I was tasked with updating the feature that uses this...

The moment I put @Validate into one of my meta-annotations, I have problems. Even with @Validate(nullable = true, allowBlanks = true) (which is technically what I want because all cells can be empty/null)... Regardless, tests start failing left and right. As a note to this point, I've tried different meta-annotations; one parses to BigDecimal via conversion, so I thought maybe I couldn't mix @Validate and @Convert, but then I tried it again on a String property with the same result.

So I've been down the rabbit hole of trying to debug this for a couple of days now and this is what I've found: When the parser reaches DefaultConversionProcessor.applyConversions(String[], Context), conversion.applyConversion(index, row[index], convertedFlags) throws the above exception because index is set to -1 for any missing fields. These indices are discovered via initializeConversions(String[], Context) which calls Context.extractedFieldIndexes() and the -1 value is legitimate.

This process simply shouldn't do that. I'm not sure how or where this was supposed to be handled, but blatantly sending a -1 to an array is just bad practice... These values should either be removed or handled before calling conversions.applyConversions().

I also experimented with columnReorderingEnabled = false, but that was causing another side effect that I decided was not worth the time to investigate.

This is the part of the stacktrace that pertains to the library:

at com.univocity.parsers.common.Internal.throwDataProcessingException(Internal.java:62)
    at com.univocity.parsers.common.Internal.process(Internal.java:57)
    at com.univocity.parsers.common.AbstractParser.rowProcessed(AbstractParser.java:716)
    at com.univocity.parsers.common.AbstractParser.parse(AbstractParser.java:152)
    at com.univocity.parsers.common.AbstractParser.parse(AbstractParser.java:759)
// my call to csvParser.parse(InputStream) and prior calls are here...
Caused by: java.lang.ArrayIndexOutOfBoundsException: -1
    at com.univocity.parsers.common.DefaultConversionProcessor.populateReverseFieldIndexes(DefaultConversionProcessor.java:151)
    at com.univocity.parsers.common.DefaultConversionProcessor.validateAllValues(DefaultConversionProcessor.java:164)
    at com.univocity.parsers.common.DefaultConversionProcessor.applyConversions(DefaultConversionProcessor.java:132)
    at com.univocity.parsers.common.processor.core.BeanConversionProcessor.createBean(BeanConversionProcessor.java:663)
    at com.univocity.parsers.common.processor.core.AbstractBeanProcessor.rowProcessed(AbstractBeanProcessor.java:54)
    at com.univocity.parsers.common.Internal.process(Internal.java:30)
    ... 7 more