mafintosh / csv-parser

Streaming csv parser inspired by binary-csv that aims to be faster than everyone else
MIT License
1.41k stars 134 forks source link

Header row is returned in the 'data' callback when custom headers are used #142

Closed rblazeix closed 4 years ago

rblazeix commented 4 years ago

Expected Behavior

The header row of the CSV file is excluded from the data sent to the 'data' callback.

Actual Behavior

The header row is passed to the 'data' callback as the first element.

How Do We Reproduce?

Use a CSV file with a custom separator (';' in this case). The csv() constructor is used with the separator and headers parameters: csv({ separator: ";", headers: [...]}

shellscape commented 4 years ago

@rblazeix that's not an adequate reproduction. Please check out this link https://git.io/fNzHA

We're going to need a sample of your data, with the exact code you're using to reproduce the error, otherwise we can't do anything for you.

rblazeix commented 4 years ago

Sorry about the lack of information. Just to be sure, am I expecting the correct behavior?

The sample data to reproduce testData.csv.txt

This is a Node project :

const fs = <FSPromisified>Promise.promisifyAll(require("fs"));
const csv = require("csv-parser");
const _ = require("lodash");
let entries = [];
fs.createReadStream("sampleData.csv").pipe(csv({
                    separator: ";",
                    headers: fieldNames
                }).on("headers", (headers) => {
                    console.log(headers);
                }).on("data", (data) => {
                    entries.push(data);
                }).on("end", () => {
                    console.log(result);
                });
shellscape commented 4 years ago

Please read up on formatting with markdown: https://guides.github.com/features/mastering-markdown/. It will make your issues much easier to read when including code. (I have wrapped your code in your previous reply in code fences.)

The result variable is undefined in your example, fieldNames is undefined, there's a syntax error at the end of the block, it appears that you've included a TypeScript annotation, and you've got an unused dependency declaration in there. Can you please provide a functional reproduction in a separate repo or a gist?

rblazeix commented 4 years ago

You're right, sorry I'm not used to creating issues in open source projects. I investigated further and created a gist with a running example https://gist.github.com/rblazeix/ae71607817d00fcc5362882b7f639688 I requalified the issue (which may be an expected behavior), and it is as followed: when specifying custom headers in the csv() initialization, the "header" callback is not called and the "data" callback is called with all lines including the header row.

shellscape commented 4 years ago

No worries, you worked through it and arrived at the actual cause. That's the triage process.

Typically the headers option is only used when the first line of a file doesn't include the headers. If the first line does include the headers, but some headers need to be transformed, the mapHeaders option is used.

What you've described is expected behavior. If you'd like to specify headers for a file that has headers already, and ignore that first line of headers, you can use skipLines: 1 to skip over the first line.

rblazeix commented 4 years ago

Thank you for the explanation, and sorry for the bother.

shellscape commented 4 years ago

No bother at all. I'll add a note to the README to avoid confusion moving forward.