ruby / csv

CSV Reading and Writing
https://ruby.github.io/csv/
BSD 2-Clause "Simplified" License
178 stars 113 forks source link

duplicate headers break by_col mode #206

Closed escogido closed 3 years ago

escogido commented 3 years ago

while it is admittedly a bad practice to provide duplicate values for column headers in a CSV, this sometimes happens and is technically not even against standards (at least RFC4180 doesn't specifically prohibit that).

currently, parsing a CSV file that has duplicate headers (or multiple empty headers) causes by_col mode to return same data for different columns.

for example, with this input:

dupe,dupe,,,regular
1,2,3,4,5
11,12,13,14,15

after having parsed that into variable csv, by_row returns valid data, and by_col breaks:

3.0.0> csv.by_row.each { p _1 }
#<CSV::Row "dupe":"1" "dupe":"2" nil:"3" nil:"4" "regular":"5">
#<CSV::Row "dupe":"11" "dupe":"12" nil:"13" nil:"14" "regular":"15">
3.0.0> csv.by_col.each { p _1 }
["dupe", ["1", "11"]]
["dupe", ["1", "11"]]
[nil, ["3", "13"]]
[nil, ["3", "13"]]
["regular", ["5", "15"]]

in addition, when I tried to work around this by using a custom header converter proc { |header, info| "#{header}_#{info.index}" }, it turned out that converters do not get called on columns with missing headers; in this example, headers would convert to ["dupe_0", "dupe_1", nil, nil, "regular_4"].

is any of this intentional? any chance to change these behaviors?

thanks in advance!

kou commented 3 years ago

Implemented.