Closed contentfree closed 5 years ago
To answer my last question, a quick workaround is to nilify all empty strings in the array given to generate_line
ala values.map{|v| v.blank? ? nil : v}
(using the blank?
core extension, of course)
Why is ""|""
your problem? Can you show a problem case?
It's just unusual to see quoted values in PSV. Especially when unnecessary (as in the case with an empty string).
In the following examples, both the nil value and string-with-a-space generate expected PSV – no quotes in either. But the example that includes an empty string generates PSV with unnecessary quotes:
> puts CSV.generate_line(["There is a nil",nil,"column in this"], col_sep: '|')
There is a nil||column in this
> puts CSV.generate_line(["There is a space-only"," ","column in this"], col_sep: '|')
There is a space-only| |column in this
> puts CSV.generate_line(["There is an empty","","column in this"], col_sep: '|')
There is an empty|""|column in this
Why does the final example put quotes in the text? I assert that many systems use PSV for data exchange would be surprised to find quotes in the PSV unless the quotes should be part of the value. So, in the final (and what I consider surprising) example, many people would treat the quoted value as the literal value ""
.
If a quote_empty_strings: true
option were introduced (defaulting to true for backwards compatibility), I'd be satisfied.
This is the expected behavior. Not returning ""
would cause the field to be parsed as nil.
# with quotes
CSV.parse('There is an empty|""|column in this', col_sep: '|')
# => [["There is an empty", "", "column in this"]]
# without quotes
CSV.parse('There is an empty||column in this', col_sep: '|')
#=> [["There is an empty", nil, "column in this"]]
Ah, do you want to use the csv library to parse the PSV spec https://github.com/jgis/psv-spec data? I didn't know PSV.
If you want to use the csv library to parse the PSV spec data, we need to add backslash escape feature instead of add nil
and empty string data customize feature: https://github.com/jgis/psv-spec#6-some-characters-are-escaped-by-a-leading-backslash
I've added quote_empty
option.
Given
CSV.generate_line(["",""], col_sep: '|')
the resulting""|""
is unexpected. The expected result is a simple|
.It appears the issue is caused by https://github.com/ruby/csv/blob/ba560e407a152afffea589d832084c249471eeb6/lib/csv.rb#L1438. What's the reason for quoting empty values?
If the current behavior is on purpose (which seems to be the case), could an option be introduced to not quote empty values, such as
quote_empty: true
?(Also, does anyone know the smallest workaround for this that doesn't break legitimate double-double-quotes in CSV quoting?