Closed bhuvaneshwararaja closed 9 months ago
What you are proposing is already implemented using the preview
config option.
Hi @pokoli , I have gone through the documentation (https://www.papaparse.com/docs) for the preview options and got to know that will help to parse the expected number of rows in csv example: If we have 20 rows in csv, If the preview value is 10 . Out of 20 only 10 rows will be parsed .
But this PR will add a config to skip the N number of rows in the beginning of the csv. Example: If we have 20 rows in csv , If skipNLines value is 10 . Out of 20 , first 10 rows will be skipped .
Please check it once again , it is relevant to the issue #738 as well.
Ok, then we need to document this new flag to explain which is the ussage.
I think the flag will be better named as skipFirstLines
.
Could you have a look at the test suite? it seems you introduced some failing tests.
Hi @pokoli , Thanks Sure i will check those test suite , I will update the PR with doc
Hi @pokoli , I have updated the PR
Just a small note: This does not work for quoted fields...
1,"B\nB",C,A,"B\nB",C,A,"B\nB",C,A,"B\nB",C
2,"B\nB",C,A,"B\nB",C,A,"B\nB",C,A,"B\nB",C
3,"B\nB",C,A,"B\nB",C,A,"B\nB",C,A,"B\nB",C
4,"B\nB",C,A,"B\nB",C,A,"B\nB",C,A,"B\nB",C
5,"B\nB",C,A,"B\nB",C,A,"B\nB",C,A,"B\nB",C
5 lines with 12 columns
Papa.parse('1,"B\nB",C,A,"B\nB",C,A,"B\nB",C,A,"B\nB",C\n2,"B\nB",C,A,"B\nB",C,A,"B\nB",C,A,"B\nB",C\n3,"B\nB",C,A,"B\nB",C,A,"B\nB",C,A,"B\nB",C\n4,"B\nB",C,A,"B\nB",C,A,"B\nB",C,A,"B\nB",C\n5,"B\nB",C,A,"B\nB",C,A,"B\nB",C,A,"B\nB",C',
{skipFirstNLines: 5})
skips the "real" first line (5 because \n after each line).
@janisdd It will be great if you can fill an issue/merge request to make this work with quotes also.
Well, no... I don't think this feature should be added to the library. Here are my thoughts:
However, here are some notes if someone wants to implement it anyway:
preview
is implementedrow
in https://github.com/mholt/PapaParse/blob/e2d570e77cc2720b2c256ec099f3e7e424c39e35/papaparse.js#L1742 as long as some skip counter is too smallI'm not sure if this would also work for streaming/chunking (or whatever, I just use the normal parsing). But as far as i know there is only one core logic/function for the actual parsing... which means this should also work for streaming
I think this feature hasn't been implemented as wanted in the referenced issue #738 ! The example CSV was:
This is a data file generated by some old software.
Next line will contain a headers of parameters.
Temperature, Humidity, Voltage
22.5, 45.5, 220
23.0, 44.0, 219
Expected output should be (to be fair, it wasn't precisely specified):
Temperature, Humidity, Voltage
22.5, 45.5, 220
23.0, 44.0, 219
But the actual output with skipFirstNLines=2
is (according to the test cases):
This is a data file generated by some old software.
23.0
Analogue to the following test case in the code: https://github.com/mholt/PapaParse/pull/1021/files#diff-e0ce8cb4901057c1880bee545909a64f38c7383b4d41982d6f2db9a8ec81eac7R1588
{
description: "Skip First N number of lines , with header and 3 rows",
input: 'a,b,c,d\n1,2,3,4\n4,5,6,7',
config: { header: true, skipFirstNLines: 1 },
expected: {
data: [{a: '4', b: '5', c: '6', d: '7'}],
errors: []
}
}
This test case is not realistic because it does not reflect a CSV with some preamble lines to be skipped.
should be rather: input: 'to-be-ignored\na,b,c,d\n1,2,3,4',
and expected.data: [{a: '1', b: '2', c: '3', d: '4'}]
.
If someone wanted to skip rows of the record set, it should not be done during the parsing phase, but later when working with the data sets by doing some simple postprocessing such as data = data.slice(1)
.
@bhuvaneshwararaja, @pokoli could you please take a look at it and give me some feedback about my thoughts? :)
Issues No : #738
Description:
In this feature enhancement, Added the capability to skip a defined number of lines when converting a CSV (Comma-Separated Values) file to JSON format. This functionality offers users more flexibility when dealing with CSV files with header information or metadata at the beginning, allowing for a more efficient and precise conversion process.
Changes Made:
Test cases:
Included test cases to validate the functionality of the skipFirstNLines feature, ensuring that it behaves correctly under various scenarios.
Test 1: "Skip First N number of lines , with header",
Test 2: "Skip First N number of lines , without header",
Test 3: "Skip First N lines with value as 0 with header false",
Test 4: "Skip First N lines with value as 0 with header true",
Test 5: "Skip First N lines with value less then 0, with header true",
Test 6: "Without Skip First N Lines config",