mafintosh / csv-parser

Streaming csv parser inspired by binary-csv that aims to be faster than everyone else
MIT License
1.41k stars 134 forks source link

When parsing a CSV that has headers with 2 or more matching names, data is lost #224

Open MichaelFoss opened 1 year ago

MichaelFoss commented 1 year ago

Expected Behavior

The parsed data should be maintained in an array of strings instead of a single string.

Actual Behavior

Only the last read value is maintained.

How Do We Reproduce?

  1. Create a .csv file with at least columns with the same name and one row with different data for each
  2. Try to parse the data without losing data
MichaelFoss commented 1 year ago

I recommend having an option that allows changing behavior in the event of duplicate headers. Something like useArraysForDuplicateHeaders as a boolean flag that defaults to false; this way it can maintain the existing behavior, with the use of arrays for duplicated headers being a feature.

Consider the simple file:

a,a,b
1,2,3

The row object in the data event's function looks like this:

{ a: '2', b: '3' }

What I'd like to see, when this flag is enabled in the options, is this:

{ a: [ '1', '2' ], b: '3' }

Line 185 of index.js reads as follows:

o[header] = cell

This change to the writeRow function on line 185 will allow it to work as expected:

if (Array.isArray(o[header])) {
  o[header].push(cell)
} else if (o.hasOwnProperty(header)) {
  o[header] = [o[header], cell]
} else {
  o[header] = cell
}

The issue with this approach, of course, are the extra operations, which may cause a performance hit.