ruby / csv

CSV Reading and Writing
https://ruby.github.io/csv/
BSD 2-Clause "Simplified" License
176 stars 114 forks source link

CSV.generate_line (and others) generating unexpected results for pipe-delimited values #35

Closed contentfree closed 5 years ago

contentfree commented 6 years ago

Given CSV.generate_line(["",""], col_sep: '|') the resulting ""|"" is unexpected. The expected result is a simple |.

It appears the issue is caused by https://github.com/ruby/csv/blob/ba560e407a152afffea589d832084c249471eeb6/lib/csv.rb#L1438. What's the reason for quoting empty values?

If the current behavior is on purpose (which seems to be the case), could an option be introduced to not quote empty values, such as quote_empty: true?

(Also, does anyone know the smallest workaround for this that doesn't break legitimate double-double-quotes in CSV quoting?

contentfree commented 6 years ago

To answer my last question, a quick workaround is to nilify all empty strings in the array given to generate_line ala values.map{|v| v.blank? ? nil : v} (using the blank? core extension, of course)

kou commented 6 years ago

Why is ""|"" your problem? Can you show a problem case?

contentfree commented 6 years ago

It's just unusual to see quoted values in PSV. Especially when unnecessary (as in the case with an empty string).

In the following examples, both the nil value and string-with-a-space generate expected PSV – no quotes in either. But the example that includes an empty string generates PSV with unnecessary quotes:

> puts CSV.generate_line(["There is a nil",nil,"column in this"], col_sep: '|')
There is a nil||column in this

> puts CSV.generate_line(["There is a space-only"," ","column in this"], col_sep: '|')
There is a space-only| |column in this

> puts CSV.generate_line(["There is an empty","","column in this"], col_sep: '|')
There is an empty|""|column in this

Why does the final example put quotes in the text? I assert that many systems use PSV for data exchange would be surprised to find quotes in the PSV unless the quotes should be part of the value. So, in the final (and what I consider surprising) example, many people would treat the quoted value as the literal value "".

contentfree commented 6 years ago

If a quote_empty_strings: true option were introduced (defaulting to true for backwards compatibility), I'd be satisfied.

stevendaniels commented 6 years ago

This is the expected behavior. Not returning "" would cause the field to be parsed as nil.


# with quotes
 CSV.parse('There is an empty|""|column in this', col_sep: '|')
# => [["There is an empty", "", "column in this"]]

# without quotes
CSV.parse('There is an empty||column in this', col_sep: '|')
#=> [["There is an empty", nil, "column in this"]]
kou commented 6 years ago

Ah, do you want to use the csv library to parse the PSV spec https://github.com/jgis/psv-spec data? I didn't know PSV.

If you want to use the csv library to parse the PSV spec data, we need to add backslash escape feature instead of add nil and empty string data customize feature: https://github.com/jgis/psv-spec#6-some-characters-are-escaped-by-a-leading-backslash

kou commented 5 years ago

I've added quote_empty option.