uniVocity / univocity-parsers

uniVocity-parsers is a suite of extremely fast and reliable parsers for Java. It provides a consistent interface for handling different file formats, and a solid framework for the development of new parsers.
900 stars 245 forks source link

StringIndexOutOfBoundsException when quoted space/s are present in first row with parse option Ignore trailing whitespaces as true and Ignore leading whitespaces as false #542

Open vikashah opened 1 month ago

vikashah commented 1 month ago

com.univocity.parsers.common.TextParsingException: java.lang.StringIndexOutOfBoundsException thrown at https://github.com/uniVocity/univocity-parsers/blob/master/src/main/java/com/univocity/parsers/common/ArgumentUtils.java#L601 with below parse option configuration:

Parser Configuration: CsvParserSettings: Auto configuration enabled=true Auto-closing enabled=true Autodetect column delimiter=false Autodetect quotes=false Column reordering enabled=true Delimiters for detection=null Empty value=null Escape unquoted values=false Header extraction enabled=null Headers=null Ignore leading whitespaces=false Ignore leading whitespaces in quotes=false Ignore trailing whitespaces=true Ignore trailing whitespaces in quotes=false Input buffer size=1048576 Input reading on separate thread=true Keep escape sequences=false Keep quotes=false Length of content displayed on error=-1 Line separator detection enabled=false Maximum number of characters per column=4096 Maximum number of columns=512 Normalize escaped line separators=true Null value=null Number of records to read=all Processor=none Restricting data in exceptions=false RowProcessor error handler=null Selected fields=none Skip bits as whitespace=true Skip empty lines=true Unescaped quote handling=nullFormat configuration: CsvFormat: Comment character=# Field delimiter=, Line separator (normalized)=\n Line separator sequence=\n Quote character=" Quote escape character=" Quote escape escape character=null

Key things to note about the scenario is Ignore leading whitespaces=false and Ignore trailing whitespaces=true and the input csv file has quoted spaces in the first line. So, this is reproducible with a csv file like below: " "

The while loop on https://github.com/uniVocity/univocity-parsers/blob/master/src/main/java/com/univocity/parsers/common/ArgumentUtils.java#L601 is doing an unchecked decrement and access of the index causing a StringIndexOutOfBoundsException. Proposed fix: modify the while condition as below: while (right && end >= 0 && input.charAt(end) <= ' ')

UltraCharge99 commented 1 month ago

Please see https://github.com/uniVocity/univocity-parsers/issues/534