mholt / PapaParse

Fast and powerful CSV (delimited text) parser that gracefully handles large files and malformed input
http://PapaParse.com
MIT License
12.3k stars 1.14k forks source link

skipFirstNLines should not take the first line as header #1045

Open jkruke opened 4 months ago

jkruke commented 4 months ago

I think this feature hasn't been implemented as wanted in the referenced issue #738 ! The example CSV was:

This is a data file generated by some old software.
Next line will contain a headers of parameters.
Temperature, Humidity, Voltage
22.5, 45.5, 220
23.0, 44.0, 219

Expected output should be (to be fair, it wasn't precisely specified):

Temperature, Humidity, Voltage
22.5, 45.5, 220
23.0, 44.0, 219

But the actual output with skipFirstNLines=2 is (according to the test cases):

This is a data file generated by some old software.
23.0

Analogue to the following test case in the code: https://github.com/mholt/PapaParse/pull/1021/files#diff-e0ce8cb4901057c1880bee545909a64f38c7383b4d41982d6f2db9a8ec81eac7R1588

{
        description: "Skip First N number of lines , with header and 3 rows",
        input: 'a,b,c,d\n1,2,3,4\n4,5,6,7',
        config: { header: true, skipFirstNLines: 1 },
        expected: {
            data: [{a: '4', b: '5', c: '6', d: '7'}],
            errors: []
        }
    }

This test case is not realistic because it does not reflect a CSV with some preamble lines to be skipped. should be rather: input: 'to-be-ignored\na,b,c,d\n1,2,3,4', and expected.data: [{a: '1', b: '2', c: '3', d: '4'}].

If someone wanted to skip rows of the record set, it should not be done during the parsing phase, but later when working with the data sets by doing some simple postprocessing such as data = data.slice(1).

@bhuvaneshwararaja, @pokoli could you please take a look at it and give me some feedback about my thoughts? :)

Originally posted by @jkruke in https://github.com/mholt/PapaParse/issues/1021#issuecomment-1980130733