ruby / csv

CSV Reading and Writing
https://ruby.github.io/csv/
BSD 2-Clause "Simplified" License
181 stars 114 forks source link

strip-option changes column value parsing logic and throws wacky TODO errors #156

Closed timolehto closed 4 years ago

timolehto commented 4 years ago

As we see here clearly one does not need to quote columns containing spaces:

CSV.parse <<~CSV
id field
   my id
CSV
=> [["id field"], ["   my id"]]

However, we don't want that annoying leading whitespace... maybe quoting the values would magically drop it.. not exactly.

CSV.parse <<~CSV
id field
   "my id"
CSV
CSV::MalformedCSVError: Illegal quoting in line 2.
from /Users/timok/.rbenv/versions/2.6.6/lib/ruby/gems/2.6.0/gems/csv-3.1.2/lib/csv/parser.rb:923:in `parse_quotable_robust'

Luckily we have this strip option.. which should do exactly what we want

CSV.parse <<~CSV, strip: true
id field
   my id
CSV
CSV::MalformedCSVError: TODO: Meaningful message in line 1.
from /Users/timok/.rbenv/versions/2.6.6/lib/ruby/gems/2.6.0/gems/csv-3.1.2/lib/csv/parser.rb:931:in `parse_quotable_robust'

Whoops, I guess I need to quote that value to be able to "strip" it. Also gotta <3 these meaningful error messages :)

CSV.parse <<~CSV, strip: true
id field
   "my id"
CSV
CSV::MalformedCSVError: TODO: Meaningful message in line 1.
from /Users/timok/.rbenv/versions/2.6.6/lib/ruby/gems/2.6.0/gems/csv-3.1.2/lib/csv/parser.rb:931:in `parse_quotable_robust'

Still no good. What could be the problem? It seems I can't have any fields with spaces in them if I intend to use this strip feature.. so I would need to provide the data like this:

CSV.parse <<~CSV, strip: true
"id field",foo   ,   bar
   "my id",   ugh,gah   
CSV
=> [["id field", "foo", "bar"], ["my id", "ugh", "gah"]]

Yet, even unquoted values get the whitespaces stripped, but somehow a whitespace inside the column value is not acceptable when using strip option, unless also quoting, which you normally wouldn't need to be doing.

In my opinion the expected and correct behaviour would be that this should work just fine:

CSV.parse <<~CSV, strip: true
id field,foo   ,   bar
   my id,   ugh,gah   
CSV

And also produce: [["id field", "foo", "bar"], ["my id", "ugh", "gah"]] It's should be simple. You respect the column separator and treat everything inside of them as column value and then invoke the stripping on it, whether or not it contains some spaces shouldn't matter one bit.

Now it's a different matter then if you need to have commas or line breaks inside your columns then of course you must resort to quoting

CSV.parse <<~CSV, strip: true
id field, "this, that or other", aaaa
   my id,   other,blah   
CSV
kou commented 4 years ago

Could you try the latest release?

timolehto commented 4 years ago

Hah, sorry for the unnecessary issue report. It seems this was already fixed in version 3.1.3