uniVocity-parsers is a suite of extremely fast and reliable parsers for Java. It provides a consistent interface for handling different file formats, and a solid framework for the development of new parsers.
915
stars
251
forks
source link
Incorrect parsing of escape character and quotation mark in csv data #495
I've set my escape character to be a \ backslash and expect it to escape other backslashes and quotation marks but I see inconsistent behavior from the parser. Let's look at this example.
The number of quotation marks are incorrect and the \ backslash escape character seems to be getting ignored in some cases. Crucially, the columns in rows 5 and 6 are concatenated into a single column rather than the 2 columns that exist in the source data. This data is a little odd but the problem is that I don't see consistent behavior.
Rows 5 and 6 should not have the same-shaped output; I expect row 6 to contain a backslash. It seems that parserSettings.getFormat().setCharToEscapeQuoteEscaping('\\'); doesn't work here.
Rows 3, 4, and 5 just don't honor the escape character, granted, I could maybe understand some weird behavior here since honoring the escape would result in mismatched quotation marks.
I've tried various CSVParserSettings options and found nothing that outputs 2 columns for rows 5 and 6. Could I please get an explanation / some help?
I've set my escape character to be a
\
backslash and expect it to escape other backslashes and quotation marks but I see inconsistent behavior from the parser. Let's look at this example.I intend for this output to match the source data but this is what I get:
The number of quotation marks are incorrect and the
\
backslash escape character seems to be getting ignored in some cases. Crucially, the columns in rows 5 and 6 are concatenated into a single column rather than the 2 columns that exist in the source data. This data is a little odd but the problem is that I don't see consistent behavior.Rows 5 and 6 should not have the same-shaped output; I expect row 6 to contain a backslash. It seems that
parserSettings.getFormat().setCharToEscapeQuoteEscaping('\\');
doesn't work here.Rows 3, 4, and 5 just don't honor the escape character, granted, I could maybe understand some weird behavior here since honoring the escape would result in mismatched quotation marks.
I've tried various CSVParserSettings options and found nothing that outputs 2 columns for rows 5 and 6. Could I please get an explanation / some help?