mafintosh / csv-parser

Streaming csv parser inspired by binary-csv that aims to be faster than everyone else
MIT License
1.41k stars 134 forks source link

Any chance to supply a custom row mapper as a config option #149

Closed pkese closed 4 years ago

pkese commented 4 years ago

Feature Proposal

This proposal tries to improve the WriteRow function, which is always creating anonymous objects and can not be tinkered with: https://github.com/mafintosh/csv-parser/blob/264e15a2de73e9759a61ae278457e75f9804d3d7/index.js#L185

I propose to have an option to supply a custom object generator/constructor

For converting example CSV file:

width,height,color,x1,x2,x3,x4,x5
10.2,4.3,blue,1,2,3,4,5

One could supply a custom object constructor like this:

function customRowMapper(cells, headers) {
    let [w, h, color, ...xs] = cells;
    let area = w*h;
    return {w, h, area, xs}; // ignore color, but add area and [xs]
}

so the result would be

{w:10.2, h:4.3, area:43.86, xs:[1,2,3,4,5]}

Feature Use Case

The reason for this request is that I am dealing with ordered lists of items (columns known in advance) and it would make life much easier, if there was an option to convert the array of cells directly into objects on the fly, rather than reconstructing them from generic objects later on.

Another use case would be if someone would prefer to interpret each row as an array of values and return arrays rather than objects. Or custom classes.

According to this proposal, the default object constructor (i.e. when not overriden) would be something along the lines of:

function defaultRowMapper(cells, headers) {
    let o = {}
    for [key,value] in _.zip(headers,cells) {
        o[key] = value
    }
    return o;
}

The headers parameter would come as the second parameter, as it is not strictly required (see the first example).

The third parameter to custom constructor could be the row number (e.g. so someone could add row number to output records even if there wasn't any row number column in the CSV file).

Optionally, one could write custom code to deal with CSV rows with less or more columns than are specified in headers. Similar as the above case - turn all remaining columns into an array.

shellscape commented 4 years ago

Thanks for opening an issue and taking the time to explain your use case :beer:

I think what you're after here is a fork of the project, customized to your use case. I work with some pretty crazy data streams on a daily basis and we leverage the module fairly heavily. Every single time I start thinking about modifying the parser (this module in some cases) I'm reminded that the parser is working against a spec. When the data source changes, it's not the parsers fault. It's also not the job of a parser to handle variations in what's supposed to be an otherwise standard format.

Alternatively, you could pipe the result to a transform stream which would still be very performant and allow you to manipulate the data as it's streamed.

Unfortunately we're going to pass on your modification proposal at the moment.