mholt / PapaParse

Fast and powerful CSV (delimited text) parser that gracefully handles large files and malformed input
http://PapaParse.com
MIT License
12.56k stars 1.15k forks source link

Add chunkHandler callback for use inside worker thread. #265

Open SamMousa opened 9 years ago

SamMousa commented 9 years ago

I'm trying to create an interactive csv mapper / parser / uploader.

Assumptions:

Possible solutions (also based on other issues mentioning pausing for web workers in general).

Note that while #130 notes the downsides of allowing the asynchronous callback to pause workers it does not explore other solutions.

For example, the parse object (Papa.parse(File, config)) could expose the pause, abort and resume functions and post messages to the worker. This would allow for asynchronous pausing (ie pausing will happen whenever the worker thread checks its messages next.), but that is acceptable for most use cases. For me it's unclear why resuming would be an issue. The only issue I currently see is that we cannot get a reference to the worker from outside the parse object, but that is something that can easily be fiexd.

SamMousa commented 9 years ago

Just noticed that when using workers, Papa.parse has no return value at all. It would be easy to create a management object that has a reference to the worker and functions pause and resume to send messages to the worker right?

adamreisnz commented 3 weeks ago

Since this is an issue open since 2015, I don't have high hopes of it being addressed.

But I too have exactly the same scenario as described above. I think it's a pretty significant shortcoming not to be able to pause processing in workers.

Yes, it will slow down processing as we're being warned of in the FAQ, but that should not be a reason to not have this feature at all in my opinion. The user will have to wait for server side processing to complete anyway, so it doesn't matter if processing is slower.

Currently we have no way of staggering that as the parser just dumps chunk after chunk which overloads the server.

The only way to solve this as far as I can see, would be to store the output chunks in memory, but that would defeat the purpose of streaming the file in the first place.