mholt / PapaParse

Fast and powerful CSV (delimited text) parser that gracefully handles large files and malformed input
http://PapaParse.com
MIT License
12.3k stars 1.14k forks source link

Demo page appears to be failing with errors (large file, streaming and/or with headers) #1037

Open wu-lee opened 5 months ago

wu-lee commented 5 months ago

I'm investigating problems with PapaParse failing with truncated remote files - these manifest as parse errors, but in fact it seems to be that the file is being chunked and the parser only sees the first chunk, which is invalid because the last line is cut in two. (I'm parting with headers).

Anyway, to test that out independent of my personal case, I tried the demo here: https://www.papaparse.com/demo

And that seems to have problems of its own.

Plaform:

To replicate:

Expected results

Parsing the large file should succeed with no errors, whatever options are selected.

Actual results

Output in console:

Synchronous results: undefined [demo.js:157:12](https://www.papaparse.com/resources/js/demo.js)
Running... [demo.js:159:13](https://www.papaparse.com/resources/js/demo.js)
XHRGET
https://www.papaparse.com/resources/files/big.csv
[HTTP/2 206  1872ms]

XHRGET
https://www.papaparse.com/resources/files/big.csv

ERROR: Error: [object ProgressEvent]
    _chunkError https://unpkg.com/papaparse@latest/papaparse.min.js:7
    v https://unpkg.com/papaparse@latest/papaparse.min.js:7
    _readChunk https://unpkg.com/papaparse@latest/papaparse.min.js:7
    _nextChunk https://unpkg.com/papaparse@latest/papaparse.min.js:7
    parseChunk https://unpkg.com/papaparse@latest/papaparse.min.js:7
    _chunkLoaded https://unpkg.com/papaparse@latest/papaparse.min.js:7
    v https://unpkg.com/papaparse@latest/papaparse.min.js:7
    _readChunk https://unpkg.com/papaparse@latest/papaparse.min.js:7
    _nextChunk https://unpkg.com/papaparse@latest/papaparse.min.js:7
    stream https://unpkg.com/papaparse@latest/papaparse.min.js:7
    parse https://unpkg.com/papaparse@latest/papaparse.min.js:7
    <anonymous> https://www.papaparse.com/resources/js/demo.js:156
    jQuery 2
 undefined

Variations

A similar output appears when run with only "Stream" checked.

With only "Header row" checked, a different error, which looks to me like the file is being truncated:

Running... [demo.js:159:13](https://www.papaparse.com/resources/js/demo.js)
XHRGET
https://www.papaparse.com/resources/files/big.csv
[HTTP/2 200  1555ms]

Parse complete [demo.js:175:11](https://www.papaparse.com/resources/js/demo.js)
       Time: 4583.424999999988 ms [demo.js:176:10](https://www.papaparse.com/resources/js/demo.js)
  Row count: 1094353 [demo.js:177:10](https://www.papaparse.com/resources/js/demo.js)
     Errors: 1 [demo.js:180:10](https://www.papaparse.com/resources/js/demo.js)
First error: 
Object { type: "FieldMismatch", code: "TooFewFields", message: "Too few fields: expected 6 fields but parsed 1", row: 1094352 }
[demo.js:182:11](https://www.papaparse.com/resources/js/demo.js)
    Results: 
Object { data: (1094353) […], errors: (1) […], meta: {…} }
[demo.js:236:10](https://www.papaparse.com/resources/js/demo.js)

With neither - succeeds with no errors.


Running... [demo.js:159:13](https://www.papaparse.com/resources/js/demo.js)
XHRGET
https://www.papaparse.com/resources/files/big.csv
[HTTP/2 200  1555ms]

Parse complete [demo.js:175:11](https://www.papaparse.com/resources/js/demo.js)
       Time: 4583.424999999988 ms [demo.js:176:10](https://www.papaparse.com/resources/js/demo.js)
  Row count: 1094353 [demo.js:177:10](https://www.papaparse.com/resources/js/demo.js)
     Errors: 1 [demo.js:180:10](https://www.papaparse.com/resources/js/demo.js)
First error: 
Object { type: "FieldMismatch", code: "TooFewFields", message: "Too few fields: expected 6 fields but parsed 1", row: 1094352 }
[demo.js:182:11](https://www.papaparse.com/resources/js/demo.js)
    Results: 
Object { data: (1094353) […], errors: (1) […], meta: {…} }
[demo.js:236:10](https://www.papaparse.com/resources/js/demo.js)
​```