while it is admittedly a bad practice to provide duplicate values for column headers in a CSV, this sometimes happens and is technically not even against standards (at least RFC4180 doesn't specifically prohibit that).
currently, parsing a CSV file that has duplicate headers (or multiple empty headers) causes by_col mode to return same data for different columns.
for example, with this input:
dupe,dupe,,,regular
1,2,3,4,5
11,12,13,14,15
after having parsed that into variable csv, by_row returns valid data, and by_col breaks:
in addition, when I tried to work around this by using a custom header converter proc { |header, info| "#{header}_#{info.index}" }, it turned out that converters do not get called on columns with missing headers; in this example, headers would convert to ["dupe_0", "dupe_1", nil, nil, "regular_4"].
is any of this intentional? any chance to change these behaviors?
while it is admittedly a bad practice to provide duplicate values for column headers in a CSV, this sometimes happens and is technically not even against standards (at least RFC4180 doesn't specifically prohibit that).
currently, parsing a CSV file that has duplicate headers (or multiple empty headers) causes
by_col
mode to return same data for different columns.for example, with this input:
after having parsed that into variable
csv
,by_row
returns valid data, andby_col
breaks:in addition, when I tried to work around this by using a custom header converter
proc { |header, info| "#{header}_#{info.index}" }
, it turned out that converters do not get called on columns with missing headers; in this example, headers would convert to["dupe_0", "dupe_1", nil, nil, "regular_4"]
.is any of this intentional? any chance to change these behaviors?
thanks in advance!