mholt / PapaParse

Fast and powerful CSV (delimited text) parser that gracefully handles large files and malformed input
http://PapaParse.com
MIT License
12.57k stars 1.15k forks source link

Option to Disable Auto-Escaping During Parsing #1039

Open VDumitrak opened 9 months ago

VDumitrak commented 9 months ago

Summary: Many users of PapaParse may encounter CSV files that contain pre-escaped characters. In the current implementation, PapaParse automatically adds escaping to these characters, which results in double-escaped characters in the output. This behavior can be problematic for CSVs that are expected to contain escape characters as part of the data.

Issue: When parsing a CSV with pre-escaped quotes (either with a backslash or double quotes), PapaParse's parser automatically escapes these characters, leading to an unexpected doubling of escape characters in the output.

For example, an input CSV line like: "Test \"Test string\" Test","Definitely \"real\" cash" gets parsed to: ["Test \\\"Test string\\\" Test", "Definitely \\\"real\\\" cash"] instead of the expected: ["Test \"Test string\" Test", "Definitely \"real\" cash"]

Similarly, a value enclosed in triple quotes to signify an internal quote like: """Test \"Test string\" Test""" results in: ["\"Test \\\"Test string\\\" Test\""] which should ideally remain: ["""Test \"Test string\" Test"""]

Feature Request: It would be beneficial to have an option to disable auto-escaping entirely when parsing CSV files. This would allow users to work with CSV data that already includes the necessary escaping and expects it to be preserved as-is.

pokoli commented 9 months ago

I'm wondering if instead of adding an option it will be better to detect if the quotes are already scaped and always parse the right string. What do you think?

VDumitrak commented 9 months ago

I'm wondering if instead of adding an option it will be better to detect if the quotes are already scaped and always parse the right string. What do you think?

That sounds like a great approach! If PapaParse could intelligently detect pre-escaped quotes and parse them correctly without additional configuration, it would seamlessly handle various CSV formats and make the parsing process much more intuitive.

pokoli commented 9 months ago

Maybe we just need to know if the next caracter is the same caracter, so this will mean that is already escaped. But this is just a quick tought and maybe its more complex.