Open dobesv opened 5 years ago
Wow, this sounds like a very good addition for Papaparse.
Is this supported on all major browsers? I can not found the feature on caniuse.com
I'm wondering if it will be possible to activate this behaviour by a configuration parameter. So if the configuration parameter is set paparse will return the iterator instead of the current array of results.
AsyncIterator can be supported in browsers using babel, I don't think it is natively supported.
I think providing a different function in Papaparse, like Papa.asyncIterable({...options...})
would make more sense because the API is quite different in this case. It is not great design to have radically different return types for a single function.
The async iterable protocol only requires Symbol.asyncIterable
which isn't necessarily available in the browser if you don't have a polyfill (e.g. @babel/polyfill
). The for await
syntax requires a transpiler for most browsers.
Maybe we can add a new method named Papa.ParseIterable, which will behave like the current parse but return and iterable instead.
Will this feature require to add babel as dependency? We should take care when adding new dependencies. Altought I dont think babe will be a big issue, if it requires it we should add as optional dependency.
You shouldn't add babel as a dependency.
I doubt many people will use ES2018 features unless they are using babel or running in an ES2018 environment. It should be fine if this particular function just throws an error, like if(!Symbol && Symbol.asyncIterator) throw new Error('This feature requires ES2018');
.
If you do want it to work even with babel / ES2018 you can use iterall
and use require('iterall').$$asyncIterator
in place of Symbol.asyncIterator
when defining the iterable:
https://github.com/leebyron/iterall/blob/master/index.mjs#L39 https://github.com/leebyron/iterall/blob/master/index.mjs#L84 https://github.com/leebyron/iterall/blob/master/index.mjs#L420
Then people who use iterall
compatible libraries / code can still use the async iterable even if they do not have ES2018 Symbol.asyncIterator
Hiya :wave:, IxJS
maintainer here. I'm in the middle of implementing a streaming CSV -> apache-arrow
transform, and having this in PapaParse would be fantastic for compatibility with both node and whatwg streams.
We should be able to use Ix's fromNodeStream()
method in node to transform PapaParse's ReadableStreamStreamer
into an AsyncIterable, or Ix's toNodeStream()
method to pipe an AsyncIterable to PapaParse's DuplexStreamStreamer
:
import fs from 'fs';
import { AsyncIterable, fromNodeStream, map } from 'ix/asynciterable';
fromNodeStream(fs
.createReadStream('cols.csv')
.pipe(Papa.parse(Papa.NODE_STREAM_INPUT)))
// maybe do an element-wise transform
.pipe(map(({ colA, colB }) => `${colA + colB}\n`))
// implicitly calls toNodeStream() when piping to a node writable stream
.pipe(fs.createWriteStream('sums.txt'))
If PapaParse had an AsyncIterable implementation, we could also use Ix to convert into whatwg streams in the browser (via AsyncIterable#toDOMStream() and AsyncIterable.fromDOMStream()
).
About Symbol.asyncIterator
, http://kangax.github.io/compat-table/es2016plus/#test-Asynchronous_Iterators indicates AsyncIterable is now supported everywhere except Edge. We've been shipping Ix and Arrow without the polyfill for the last 2-3 years and haven't heard any complaints. Typically the client's Babel or Closure compilation step will include it if necessary for their target envs.
There is a simple way, which seems no necessary to add async iterator for papaparse.
const {pipeline} = require("node:stream/promises");
const fs = require("node:fs");
const Papa = require("papaparse");
pipeline(
fs.createReadStream('1.csv'),
Papa.parse(Papa.NODE_STREAM_INPUT),
async (data) => {
for await (const a of data) {
console.log(a);
}
}
)
It's working, but it is slow.
ECMAScript 2018 introduces the AsyncIterator protocol for looping over large inputs. It's a good fit for CSV parsing. Papaparse could potentially provide a function that returns an async iterator instead of an array.
I created an example function that returns an async iterator wrapping papaprase, which you can see here:
https://gist.github.com/dobesv/e637893adb0588a768db70e2c2e7ba29
Using the standard AsyncIterator has some advantages:
ixjs
, for filtering/transforming the resultsfor await(const row of ...)
syntax in ES2018 code to loop over the results while still getting the asynchronous benefits (don't have to load everything into memory at once)Feel free to adapt my example code for inclusion in Papaparse. Or, if you feel this would be better as a separate package let me know.