New class-based API with AsyncParquetFile (for https://github.com/kylebarron/parquet-wasm/issues/215). I think this is cleaner and easier to use, and on the Rust side the only data stored in the class is the file metadata, arrow schema, and reqwest client. So if the user forgets to call .free(), not a ton of memory will leak.
Todo:
[ ] Use this API automatically under the hood for streaming
[ ] Concurrent row group fetches? Or should I leave that to the user to call Promise.all from the JS side? It's probably non-trivial to implement support for this in wasm
[ ] Ability to turn off arrow's re-chunking creating batches of 1024 rows by default is way too small imo.
Improvements
Range: bytes=-8
and then secondlyRange: bytes=-9772
for the full metadata (testing in this notebook).table = await parquetFile.read_row_group(0)
made just a single request!AsyncParquetFile
(for https://github.com/kylebarron/parquet-wasm/issues/215). I think this is cleaner and easier to use, and on the Rust side the only data stored in the class is the file metadata, arrow schema, and reqwest client. So if the user forgets to call.free()
, not a ton of memory will leak.Todo:
Promise.all
from the JS side? It's probably non-trivial to implement support for this in wasmcc @H-Plus-Time