mholt / PapaParse

Fast and powerful CSV (delimited text) parser that gracefully handles large files and malformed input
http://PapaParse.com
MIT License
12.55k stars 1.15k forks source link

Multi-character delimiter and quoted fields #866

Closed janisdd closed 3 years ago

janisdd commented 3 years ago

Recently a user reported an issue to me that multi-character delimiter does not work with quoted fields, e.g.

a, b, "c, e", d

delimiter: , (comma and a single whitespace)

I tried this on the current demo version (on the website):

Papa.parse('a, b, "c, e", d', {delimiter: ', '})

which gives:

'a', 'b', 'c, e", d'

expected

'a', 'b', 'c, e', 'd'

I also noticed that in the documentation there is always written delimiting character but actually multi-character strings are supported (from what I can see in the code and tests).

I think the issue is somewhere around this line: https://github.com/mholt/PapaParse/blob/master/papaparse.js#L1558

-if (input[quoteSearch + 1 + spacesBetweenQuoteAndDelimiter] === delim)
+if (input.substr(quoteSearch + 1 + spacesBetweenQuoteAndDelimiter, delimLen) === delim)

A test could look like this:

{
  description: "Multi-character delimiter (length 2) with quoted field",
  input: 'a, b, "c, e", d',
  config: { delimiter: ", " },
  notes: "The quotes must be immediately adjacent to the delimiter to indicate a quoted field",
  expected: {
    data: [['a', 'b', 'c, e', 'd']],
    errors: []
  }
}
christopherfanning commented 3 years ago

I want this too. Were you able to get this working?

janisdd commented 3 years ago

With the proposed changes it works as it should.

pokoli commented 3 years ago

Could you please fill a PR with the tests and the proposed change?

I will be happy to merge it if that solves the problem.

janisdd commented 3 years ago

I think the documentation needs to be changed as well. Maybe I could get to this this weekend and create a PR.

pokoli commented 3 years ago

@janisdd, Yes a documentation update will be good also. Waiting for your PR!

pokoli commented 3 years ago

Should be fixed with #879