Closed Jibbow closed 3 years ago
"value1", "value2"
is not valid CSV per RFC-4180.
Each field may or may not be enclosed in double quotes (however some programs, such as Microsoft Excel, do not use double quotes at all). If fields are not enclosed with double quotes, then double quotes may not appear inside the fields. For example:
"aaa","bbb","ccc" CRLF
zzz,yyy,xxx
The CSV you posted is not a valid quoted field because "value2"
contains double quotes even though it is not enclosed by double quotes (it begins with whitespace).
Technically there is no such thing as valid CSV, since RFC 4180 is merely a suggestion, but this parser targets mainly RFC 4180-ish CSV files. I don't see a trivial way to modify the parser to accommodate this deviation from RFC 4180.
As a general rule, I do not expect the parser to handle significant deviations from RFC 4180.
Makes sense, thanks! :)
When there is a quoted field with leading whitespace characters, and the
trim()
option is enabled for whitespaces, the opening quote of the quoted field is included as the content of the field. It's probably easier to demonstrate this with an example:Assume we have the following format config:
And we want to parse the following CSV file:
The first value in
column1
is parsed correctly asvalue1
. However, the value incolumn2
is parsed as"value2
(not the opening quote)