Provide a new option to the options object that allows users to choose how to modify the output object based on each cell in the row. Documentation provided below:
reduceValues
Type: Function
A function that can be used to modify the object that is emitted by the stream, similar to a reducer function. The return value will replace the existing object used accumulating columns in the row. If null is returned, the rest of the row is skipped (similar to mapHeaders).
memoObject or any The current object representing the values in a row
headerString The current column header.
indexNumber The current column index.
valueString or any The current column value (or content).
If both mapValues and reduceValues functions are provided, mapValues is run first, and the output is provided in the value parameter of the reduceValues function.
Feature Use Case
This feature gives the developer more control over constructing the object that is passed through the stream, or choosing to skip a row based on the values encountered for a row. Some specific example use case:
If multiple columns have duplicate-named headers, instead of automatically taking the last one, a user could choose to take the one that does not have an empty value, or take the max of the values.
Skip records (rows) if the first column is blank
If headers are named like a JS object path, eg. first[0].inner, user can handle logic of constructing nested objects
Rehydrating a row with a particular type of object, ie. on index === 0, instantiate a new RowObject()
The goal of this proposal is to allow developers more freedom over the deserialization without increasing maintenance burden for the maintainers.
Why not just make the user transform the object downstream from the CSV parser. After all, this is only a CSV parser!!!
Moving the object reducing logic upstream can help maintain the fast speed this library is known for, as well as allow users to build their own workaround for issues like #150. If a user has more control over skipping a row, or can stop deserializing the rest of the row, there are performance advantages, especially in very wide data sets or ones with lots of rows to be skipped. In addition, some behaviors, like when there are duplicate header names, cannot be addressed downstream.
Feature Proposal
Provide a new option to the options object that allows users to choose how to modify the output object based on each cell in the row. Documentation provided below:
reduceValues
Type:
Function
A function that can be used to modify the object that is emitted by the stream, similar to a reducer function. The return value will replace the existing object used accumulating columns in the row. If
null
is returned, the rest of the row is skipped (similar to mapHeaders).Parameters
memo Object or any The current object representing the values in a row header String The current column header. index Number The current column index. value String or any The current column value (or content).
If both
mapValues
andreduceValues
functions are provided,mapValues
is run first, and the output is provided in the value parameter of thereduceValues
function.Feature Use Case
This feature gives the developer more control over constructing the object that is passed through the stream, or choosing to skip a row based on the values encountered for a row. Some specific example use case:
first[0].inner
, user can handle logic of constructing nested objectsindex === 0
, instantiate anew RowObject()
The goal of this proposal is to allow developers more freedom over the deserialization without increasing maintenance burden for the maintainers.
Why not just make the user transform the object downstream from the CSV parser. After all, this is only a CSV parser!!!
Moving the object reducing logic upstream can help maintain the fast speed this library is known for, as well as allow users to build their own workaround for issues like #150. If a user has more control over skipping a row, or can stop deserializing the rest of the row, there are performance advantages, especially in very wide data sets or ones with lots of rows to be skipped. In addition, some behaviors, like when there are duplicate header names, cannot be addressed downstream.