I've noticed that the csv gem produces a placeholder error message when attempting to parse a TSV with ambiguous parsing options.
If one were to parse the string
foo\t\tbar
when directed to strip whitespace (i.e., to treat whitespace as insignificant) and use tabs as field separators (i.e., to treat whitespace as significant), then should one parse it as ["foo\t\tbar"], ["foo", "bar"], or ["foo", "", "bar"]? There's no way to choose a parsing strategy that wouldn't cause a reasonable surprise to someone.
I think an error should be produced that informs the user that this is a problem; at the moment, this doesn't happen. With Ruby 3.0.2 and csv 3.2.1, the file
csv-3.2.1/lib/csv/parser.rb:935:in `parse_quotable_robust': TODO: Meaningful message in line 1. (CSV::MalformedCSVError)
I think this is misleading; the file is not really "malformed".
Do you agree with the above? If so, I'm happy to offer up a merge request around this (e.g. to check the options before beginning the parsing, to make sure that if strip is set to true then col_sep must not be whitespace).
I've noticed that the csv gem produces a placeholder error message when attempting to parse a TSV with ambiguous parsing options.
If one were to parse the string
when directed to strip whitespace (i.e., to treat whitespace as insignificant) and use tabs as field separators (i.e., to treat whitespace as significant), then should one parse it as
["foo\t\tbar"]
,["foo", "bar"]
, or["foo", "", "bar"]
? There's no way to choose a parsing strategy that wouldn't cause a reasonable surprise to someone.I think an error should be produced that informs the user that this is a problem; at the moment, this doesn't happen. With Ruby 3.0.2 and csv 3.2.1, the file
produces the error
I think this is misleading; the file is not really "malformed".
Do you agree with the above? If so, I'm happy to offer up a merge request around this (e.g. to check the options before beginning the parsing, to make sure that if
strip
is set totrue
thencol_sep
must not be whitespace).