mholt / PapaParse

Fast and powerful CSV (delimited text) parser that gracefully handles large files and malformed input
http://PapaParse.com
MIT License
12.54k stars 1.15k forks source link

Errors and meta fields when streaming #754

Open facundolinlaud opened 4 years ago

facundolinlaud commented 4 years ago

Hello, how are you?

I've been using PapaParse in Nodejs to parse big remote files with streams. The library is great, thank you very much for your dedication.

One thing I'm trying to achieve is to be able to retrieve the errors field of an output in PapaParse's stream. When you process the data in one go (that is, when not using a stream), the output you get is of the following structure:

{
   data: [...],
   errors: [...],
   meta: {...}
}

However, when piping PapaParse's read stream to a writable stream, I don't receive an object with errors and meta fields, but a parsed row only. I would like to know if there is any way of receiving this whole object for each parsed row as I'm very interested in the TooFewFields and TooManyFields errors.

This is important for me because I would like to maintain the same error checking interface when parsing with and without PapaParse's stream, even if it means having an impact in the parsing performance.

Happy holidays!

facundolinlaud commented 4 years ago

I just tried passing a callback to Papaparse's chunk field that does in fact receives an object of the form:

{
   data: [],
   errors: [],
   meta: {...}
}

However, the data and errors fields always arrive as empty arrays, even though the stream ends up parsing the csv file as expected. The same behavior can be seen at the Papaparse's demo, where – if you try to stream parse something – you will receive an empty array of data and errors no matter the input.

Is this an expected behavior?

Thank you for your time!

pokoli commented 4 years ago

Hi @facundolinlaud,

IIRC this is some feature that is not currently suported when using NodeStreams due to the async way of NodeStreams.

Any suggestion/improvement will be very welcome.

80avin commented 1 year ago

@pokoli Can you describe how async way of NodeStreams affects it ? and why it isn't as easy as changing data to results in this line ? https://github.com/mholt/PapaParse/blob/841e1d420dd2758cccb60a2bc1f2fcffd327c251/papaparse.js#L928