uniVocity-parsers is a suite of extremely fast and reliable parsers for Java. It provides a consistent interface for handling different file formats, and a solid framework for the development of new parsers.
918
stars
251
forks
source link
StringIndexOutOfBoundsException when quoted space/s are present in first row with parse option Ignore trailing whitespaces as true and Ignore leading whitespaces as false #542
Parser Configuration: CsvParserSettings:
Auto configuration enabled=true
Auto-closing enabled=true
Autodetect column delimiter=false
Autodetect quotes=false
Column reordering enabled=true
Delimiters for detection=null
Empty value=null
Escape unquoted values=false
Header extraction enabled=null
Headers=null
Ignore leading whitespaces=false
Ignore leading whitespaces in quotes=false
Ignore trailing whitespaces=true
Ignore trailing whitespaces in quotes=false
Input buffer size=1048576
Input reading on separate thread=true
Keep escape sequences=false
Keep quotes=false
Length of content displayed on error=-1
Line separator detection enabled=false
Maximum number of characters per column=4096
Maximum number of columns=512
Normalize escaped line separators=true
Null value=null
Number of records to read=all
Processor=none
Restricting data in exceptions=false
RowProcessor error handler=null
Selected fields=none
Skip bits as whitespace=true
Skip empty lines=true
Unescaped quote handling=nullFormat configuration:
CsvFormat:
Comment character=#
Field delimiter=,
Line separator (normalized)=\n
Line separator sequence=\n
Quote character="
Quote escape character="
Quote escape escape character=null
Key things to note about the scenario is Ignore leading whitespaces=false and Ignore trailing whitespaces=true and the input csv file has quoted spaces in the first line. So, this is reproducible with a csv file like below:
" "
com.univocity.parsers.common.TextParsingException: java.lang.StringIndexOutOfBoundsException thrown at https://github.com/uniVocity/univocity-parsers/blob/master/src/main/java/com/univocity/parsers/common/ArgumentUtils.java#L601 with below parse option configuration:
Parser Configuration: CsvParserSettings: Auto configuration enabled=true Auto-closing enabled=true Autodetect column delimiter=false Autodetect quotes=false Column reordering enabled=true Delimiters for detection=null Empty value=null Escape unquoted values=false Header extraction enabled=null Headers=null Ignore leading whitespaces=false Ignore leading whitespaces in quotes=false Ignore trailing whitespaces=true Ignore trailing whitespaces in quotes=false Input buffer size=1048576 Input reading on separate thread=true Keep escape sequences=false Keep quotes=false Length of content displayed on error=-1 Line separator detection enabled=false Maximum number of characters per column=4096 Maximum number of columns=512 Normalize escaped line separators=true Null value=null Number of records to read=all Processor=none Restricting data in exceptions=false RowProcessor error handler=null Selected fields=none Skip bits as whitespace=true Skip empty lines=true Unescaped quote handling=nullFormat configuration: CsvFormat: Comment character=# Field delimiter=, Line separator (normalized)=\n Line separator sequence=\n Quote character=" Quote escape character=" Quote escape escape character=null
Key things to note about the scenario is Ignore leading whitespaces=false and Ignore trailing whitespaces=true and the input csv file has quoted spaces in the first line. So, this is reproducible with a csv file like below: " "
The while loop on https://github.com/uniVocity/univocity-parsers/blob/master/src/main/java/com/univocity/parsers/common/ArgumentUtils.java#L601 is doing an unchecked decrement and access of the index causing a StringIndexOutOfBoundsException. Proposed fix: modify the while condition as below:
while (right && end >= 0 && input.charAt(end) <= ' ')