Open apmcodes opened 5 years ago
duh ...
Hi, can you show your configuration ?
minified papaparse version : 4.6.0
After some test i've found that setting worker: true
will read the local file with the default size chunk (which is 10MiB) even when setting Papa.LocalChunkSize
to another value.
For that, i used the "large file" (~49MiB) provided in the demo and the following configuration (from the documentation) :
Papa.parse(file, {
delimiter: "", // auto-detect
newline: "", // auto-detect
quoteChar: '"',
escapeChar: '"',
header: false,
transformHeader: undefined,
dynamicTyping: false,
preview: 0,
encoding: "",
worker: false,
comments: false,
step: undefined,
complete: parseComplete,
error: undefined,
download: false,
skipEmptyLines: false,
chunk: chunkComplete,
fastMode: undefined,
beforeFirstChunk: undefined,
withCredentials: undefined,
transform: undefined,
delimitersToGuess: [',', '\t', '|', ';', Papa.RECORD_SEP, Papa.UNIT_SEP]
});
var nbChunks = 0;
function parseComplete(results, file)
{
console.info("parseComplete");
console.log(nbChunks);
nbChunks = 0;
}
function chunkComplete(results, parser)
{
nbChunks++;
}
Let's play with that while altering worker
and Papa.LocalChunkSize
worker | Papa.LocalChunkSize | nbChunks |
---|---|---|
false | default (10*2**20) | 5 ✔️ |
true | default (10*2**20) | 5 ✔️ |
false | 2**20 | 48 ✔️ |
true | 2**20 | 5 ❌ |
As a workaround, i set worker: false
and a function in chunk
. Seems to work so far.
@apmcodes hope that helped you
forgot to say that when you set worker to false, it won't launch any workers of course
I checked the old issues.
Workers do use the given chunk size with the chunkSize
configuration property (undocumented).
Avoid using Papa.LocalChunkSize with workers.
@Serrulien Thank you very much for the detailed explanation. Sorry for the late reply.
Please note: Using PapaParse in an Express app using multer middleware to upload file as multi-part.
It seems that as I'm using cloud service (S3) as remote file location and using aws-s3 sdk STREAMING api, chunkSize do not seems to have any effect (not sure if streaming is causing this issue).
The chunk size received seems to hover around 15KB (~300 rows with few columns)
NOTE: Even while streaming csv file from browser directly (no cloud storage) to PapaParse in the express app, observed the same behaviour of chunkSize.
Config
header: false,
skipEmptyLines: true,
chunk: this.importDB.bind(this),
beforeFirstChunk: this.importModel.bind(this),
complete: this.importFinish.bind(this, this.cb),
error: this.importError.bind(this),
encoding: "utf8",
preview: 0,
chunkSize: 40000
// chunkSize : 1024*1024*10, // No effect
Info fetched from PapaParse cursor object
results count 687
receivedSize 47657
Any updates on this?
I had this issue using fs.createReadStream to create the file. It appears that there is a buffer inside the stream that's about 10 MB. So it's not PapaParse's fault.
If this is your issue, you can pass parameters to fs.createReadStream to let it buffer more.
Something like this snippet should get you started...
Papa.LocalChunkSize = Papa.LocalChunkSize * 10;
const file = fs.createReadStream(dataPath, { highWaterMark: Papa.LocalChunkSize });
Trying to set chunkSize to 50Kb but no matter what I set it seems to read round 65Kb chunk. Have tried all the 3 settings individually, but do not have any effect on chunk size (number of lines read from csv on each chunk call back remains the same)
Even after setting options.chunkSize = null, Papa parses in multiple chunks
Please help ...