mholt / PapaParse

Fast and powerful CSV (delimited text) parser that gracefully handles large files and malformed input
http://PapaParse.com
MIT License
12.57k stars 1.15k forks source link

unparse: EscapeChar is wrongly set when quoteChar is changed #1068

Open janisdd opened 1 month ago

janisdd commented 1 month ago

While looking into https://github.com/mholt/PapaParse/issues/1035 I noticed that the escapeChar is wrongly set when the quoteChar is changed. This is only the case for Papap.unparse.

Example:

Papa.unparse([['a', 'x+y']], {quoteChar: '+'})

which gives currently

a,+x""y+

As you can see, the escapeChar is still set to "" and not ++.

The issue was introduced in this commit: https://github.com/mholt/PapaParse/commit/a627547d87b6c155ccb46d61ee100c7a20581cf8#diff-fbb8d0b19dc4586fadd80f52a7b49415d2d41a5fdccaeaa728f7fca51cfbbb29R277

This can be easily be fixed when changing line https://github.com/mholt/PapaParse/blob/4af6882e0ea91b0baad30345f190b38d50e7535e/papaparse.js#L364C4-L364C29

if (typeof _config.quoteChar === 'string')
  _quoteChar = _config.quoteChar;

into

if (typeof _config.quoteChar === 'string') {
  _quoteChar = _config.quoteChar;
  _escapedQuote = _quoteChar + _quoteChar;
}

And here is a test-case for the UNPARSE_TESTS

{
  description: "Other quote char but without setting escape char (should be 2x the quote char)",
  input: [['a', 'x+y']],
  config: { quoteChar: "+"},
  expected: 'a,+x++y+'
}

However, fixing this will result in

a,x++y

The correct result would be

a,+x++y+

The other part of the issue is that in line https://github.com/mholt/PapaParse/blob/4af6882e0ea91b0baad30345f190b38d50e7535e/papaparse.js#L472

needsQuotes = needsQuotes
            || _quotes === true
            || (typeof _quotes === 'function' && _quotes(str, col))
            || (Array.isArray(_quotes) && _quotes[col])
            || hasAny(escapedQuoteStr, Papa.BAD_DELIMITERS)
            || escapedQuoteStr.indexOf(_delimiter) > -1
            || escapedQuoteStr.charAt(0) === ' '
            || escapedQuoteStr.charAt(escapedQuoteStr.length - 1) === ' ';

Will always set needsQuotes=true if the field contains a " because Papa.BAD_DELIMITERS is defined as ['\r', '\n', '"', Papa.BYTE_ORDER_MARK].

There are different ways to fix this, here is one

These changed will also fix issue https://github.com/mholt/PapaParse/issues/1035

At this point, a PR would probably be easier... However, this can be implemented in different ways, so I'm hesitant.