mholt / PapaParse

Fast and powerful CSV (delimited text) parser that gracefully handles large files and malformed input
http://PapaParse.com
MIT License
12.44k stars 1.14k forks source link

Arabic support #828

Closed BENLKAMLENourdine closed 4 years ago

BENLKAMLENourdine commented 4 years ago

Papaparse doesn't parse csv files with columns containing arabic words, it returns some errors with data but it doesn't containe all the columns and the encoding is wrong

pokoli commented 4 years ago

Hi @BENLKAMLENourdine,

Which encoding are you using?

Could you provide an example to reproduce the issue?

BENLKAMLENourdine commented 4 years ago

Hi @pokoli,

you will find the file in the attachments. it contains 10000 row link to the file: Bulkupload100sar (10).xlsx

the encoding is UTF32 here is my code:

this.config: { delimiter: "", // auto-detect newline: "", // auto-detect quoteChar: '"', escapeChar: '"', header: false, transformHeader: undefined, dynamicTyping: false, preview: 0, encoding: "", worker: false, comments: false, step: undefined, complete: function(results) { console.log("Parsing complete:", results); }, error: function(error) { console.log("Parsing error:", error); }, download: false, downloadRequestHeaders: undefined, downloadRequestBody: undefined, skipEmptyLines: false, chunk: undefined, chunkSize: undefined, fastMode: undefined, beforeFirstChunk: undefined, withCredentials: undefined, transform: undefined, delimitersToGuess: [',', '\t', '|', ';', Papa.RECORD_SEP, Papa.UNIT_SEP] }

Papa.parse(this.file, this.config)

the result in the console:

Screenshot 2020-09-10 at 14 36 51 Screenshot 2020-09-10 at 14 37 08 Screenshot 2020-09-10 at 14 37 20
pokoli commented 4 years ago

Why you do not use "UTF32" as encoding parameter? This should fix the issue.

BENLKAMLENourdine commented 4 years ago

i tried it, it doesn't work, i obtain the same result

pokoli commented 4 years ago

The problem is that you are importing an Excel file but PapaParse only imports CSV file.

If you import a CSV file with the right encoding PapaParse will import it correctly