Closed icassina closed 3 years ago
The exception does not happen when setting "Ignore trailing whitespaces" to false
, but it parses and extra column for all rows, containing ^M
(\r
)
@jbax, FYI this is the self-contained reproducer:
import com.univocity.parsers.csv.CsvParser;
import com.univocity.parsers.csv.CsvParserSettings;
class Issue449 {
public static void main(String[] args) {
CsvParserSettings settings = new CsvParserSettings();
settings.getFormat().setDelimiter("|");
settings.setIgnoreLeadingWhitespaces(false);
settings.setInputBufferSize(128);
CsvParser parser = new CsvParser(settings);
String line = "XX |XXX-XXXX |XXXXXX " +
"|XXXXXXXX|XXXXX |XXXXXX " +
"|X|XXXXXXX|XXXXXXXX|XXXX|XXXXXXXXXXXXXXX |XXXXXXXXXXX" +
"|XXXXXX |XXXXXXXXXXXXXXXXXXXXXX|XXXXXX " +
"|XXXXXXXXXXXXXX|XXXXXX |XXXXXXXXXXXXXXXXXXXXXX" +
"|XXXXXX |XXXXXXXXXXXXXXXXXXXXXX|XXXXXX " +
"|XXXXXXXXX|XXXXXX |XXXXXXX| " +
"|| || " +
"|| ||XXXX-XX-XX 00:00:00.0000000" +
"||XXXXX.XXXXXXXXXXXXXXX|XXXXX.XXXXXXXXXXXXXX" +
"|XXXXX.XXXXXXXXXXXXXXX|X|XXXXXX |X";
parser.parseLine(line);
}
}
Fixed, I'll release version 2.9.2 tomorrow with the adjustment.
Thanks guys!
when the 2.9.2 version will be available for download?
Cant see in the download section yet https://www.univocity.com/pages/univocity_parsers_download
After upgrading from spark 2.4 to spark 3.0.1, we experienced a regression in our tests.
Reading CSV file was fine before, but now, sometimes, it triggered an
ArrayIndexOutOfBoundException
inAbstractCharInputReader
:Here's the configuration:
The input file is a
\t
separated CSV with\r\n
newlines. The code was executed on a linux machine.