mafintosh / csv-parser

Streaming csv parser inspired by binary-csv that aims to be faster than everyone else
MIT License
1.41k stars 134 forks source link

UTF-8 BOM Magic Bytes are read as part of the column header #196

Closed bashdx closed 3 years ago

bashdx commented 3 years ago

Expected Behavior

When reading a UTF-8 BOM encoded text file, the magic bytes "EF BB BF" should not be part of the column name.

Actual Behavior

The magic bytes "EF BB BF" are part of the header and therefore the property name representing the column cannot be fetched from the object, i.e. if the column name is "Date(UTC)"

obj["Date(UTC)"] returns undefined.

Fetching the key via Object.getOwnProperties and putting the string in a node Buffer looks like this:

<Buffer ef bb bf 44 61 74 65 28 55 54 43 29>

vs. expected

<Buffer 44 61 74 65 28 55 54 43 29>

How Do We Reproduce?

Create a csv file and save as UTF-8 with BOM. Where the first column header is right at the beginning of the visible file contents, i.e.

Column1;Column2;Column3
Value1-1;Value1-2;Value1-3
Value2-1;Value2-2;Value2-3