mholt / PapaParse

Fast and powerful CSV (delimited text) parser that gracefully handles large files and malformed input
http://PapaParse.com
MIT License
12.44k stars 1.14k forks source link

How to handle parsing the words 'auto-detect' when 'auto-detect' is used in the data feed #798

Closed webdesignmx05 closed 4 years ago

webdesignmx05 commented 4 years ago

Hi,

I have found papaparse to most flexible for handling parsing needs. It handles parsing of large .csv files in within the browser as well as preserves double quotes that contain commas when using 'auto-detect' as the delimiter. Very nifty in avoiding mixing in those inner commas within the double quotes. However, I'm unsure how to deal with parsing survey data feeds where someone might use the phrase 'auto-detect' when leaving feedback as I am using 'auto-detect' as my delimiter. Respondents can potentially leave anything and the papaparse is being used to read the file and do cleanup. I would like to have it successfully parse 'auto-detect' without manually modifying the file..possibly providing a config parameter to escape the 'auto-detect'. Maybe there is a parameter like that in the documentation (kinda something like .gitignore for Git) but I didn't see it.

Try parsing the 'entries-auto-detect.csv' file and notice how it provides the parse_extra field Compare this with how 'entries.csv' file is parsed. I'd like it to ignore the literal presence of 'auto-detect' in the csv data.

[ { "first name,lname,email,comments": "Bill, Jones, bjones@aol.com, greetings, this is comment 1" }, { "first name,lname,email,comments": "Mark, Jones, mjones@aol.com, \"greetings, this is comment 2. I want to ", "__parsed_extra": [ " my order notifications\"" ] }, { "first name,lname,email,comments": "James, Jones, jjones@aol.com, \"greetings, this is comment 3\"" } ]

entries-auto-detect.zip

webdesignmx05 commented 4 years ago

I will close this issue. Apparently I can get around this by simply adding an random guid string to this delimiter to make it differentiate from auto or auto-detect. I hard coded a value like auto-detect-9a162db9-efd1-4bac-a02e-e0ebf8cc28ab for delimiter and also used a random guid generator function to append to it and that solved me issue.