mafintosh / csv-parser

Streaming csv parser inspired by binary-csv that aims to be faster than everyone else
MIT License
1.41k stars 134 forks source link

Error on "reserved property names"? #228

Open LiamKarlMitchell opened 1 year ago

LiamKarlMitchell commented 1 year ago

Minor issue possibly?

Expected Behavior

An error should be able to be thrown if enabled in settings should a "reserved" or potentially problematic header name be specified in the csv contents.

Actual Behavior

Property names specified in csv headers can be set on row objects this could lead to unexpected errors or code problems when calling methods on the objects when they might be considered to be "reserved" or should avoid to be set.

How Do We Reproduce?

code.js

Readable.from(`toString
1`).pipe(csv({ columns: true,  strict: true, from_line: 1 }))
    .on('data', (data) => {
        console.log(data.toString(), data.toString() === '[object Object]')
    })

error TypeError: data.toString is not a function

This could be detected in the emitted error event but records imported could actually be used elsewhere outside of the stream processing.

See also: although it may not be every possible name that should not be used in a header. https://www.w3schools.com/js/js_reserved.asp

JavaScript Objects, Properties, and Methods You should also avoid using the name of JavaScript built-in objects, properties, and methods:

In a node.js repl for example, this would break toString on object.

// > o = {}
{}
// > o['toString'] = ""
''
// > o.toString()
Uncaught TypeError: o.toString is not a function

Which should return

'[object Object]'

I'm just wondering if there should be a way to protect against setting property names that could lead to unexpected code issues, devs are not always in control of user uploaded content. Or maybe there already is and I'm not aware?

Headers could be defined in the CSV, I suppose the mapHeaders or a validateHeaders option could be used to prevent this.

A way to specify exact header names and strip all other columns might be nice to have as well where the order of headers may come from the CSV content its self e.g. if a user reorders them legitimately in Excel then exports to CSV we should still be able to import the records into objects.

I'm not aware of real problems atm but prototype poisoning vulnerabilities do exist in other ways for JSON parsing when combined with deep copying, or if this library ever adds a feature like nested values it may be a larger problem https://medium.com/intrinsic-blog/javascript-prototype-poisoning-vulnerabilities-in-the-wild-7bc15347c96