mholt / PapaParse

Fast and powerful CSV (delimited text) parser that gracefully handles large files and malformed input
http://PapaParse.com
MIT License
12.47k stars 1.14k forks source link

Header columns in double quotes ends up with an InvalidQuotes error when I dont think it should #1003

Open tturnerswdev33 opened 1 year ago

tturnerswdev33 commented 1 year ago

If you fetch this csv file using the url to the file, then use response.text() it returns the entire CSV inside of a pair of double quotes. If the column headers are already double quoted (because excel tends to do that on save as csv) then PapaParse gives no data and yields repeated InvalidQuotes errors.

This format is very common so I am surprised it does not work. It's like PP sees the begin quote of a string and parses that as a double quote... as well as the ending double quote.

If I preprocess the text sent to papaparse to remove all the header double quotes, then it breaks if there is a comma in that header since the cols are delimited by commas.

I have attached the file if you want to try it.

pparse-double-quote.csv

tturnerswdev33 commented 1 year ago

Here is a sandbox showing one solution. We had to convert commas for the csv delimiter to a pipe, then use that as the delimiter into papaparse. That gets around all the issues with papa.parse seeing a comma in the header cols as a new column when it's just part of a string. https://codesandbox.io/s/stupefied-saha-fp2vh6