Closed kylebarron closed 2 months ago
Asset | Uncompressed Size | Compressed Size |
---|---|---|
async_full/parquet_wasm_bg.wasm | 5.44MB $\color{green}\textbf{-18.6KB -0\%}$ | 1.27MB $\color{green}\textbf{-5.32KB -0\%}$ |
slim/parquet_wasm_bg.wasm | 3.46MB $\color{red}\textbf{+1.57KB +0\%}$ | 548KB $\color{red}\textbf{+565B +0\%}$ |
sync/parquet_wasm_bg.wasm | 4.74MB $\color{red}\textbf{+1.57KB +0\%}$ | 1.04MB $\color{green}\textbf{-79B -0\%}$ |
Yep, I'll give the read_stream row groups bit a crack. I've seen some quirks relating to the package exports too, will check a few import scenarios too.
I've seen some quirks relating to the package exports too, will check a few import scenarios too.
Would love bug reports if you have any!
Change list
AsyncParquetFile
toParquetFile
ParquetFile
API.SharedIO
trait, since now we only have a single struct.with_batch_size
andselect_columns
. I don't see a need for theParquetFile
struct itself to maintain any reader state. That information is only used in the read phase, not the constructor phase, and so I think it's fine to pass those options intoread
orstream
.readRowGroup
in favor ofread
.read
now has options including arowGroups
parameter, that takes a list of integers.ReaderOptions
struct for read options, used by both sync and async readers.I couldn't get the row group selection to work in
stream()
here https://github.com/kylebarron/parquet-wasm/pull/510/files#diff-e1a77beecd2634c6c0489c20cc3cae036ed6668d62c4d47f00760ab60b0d404eR188-R192I was hitting lifetime errors with having a
Vec<usize>
there that wouldn't live long enough for the stream.@H-Plus-Time I'd like to get a release out in the next day or two, because otherwise I'll forget about it again and it'll never get released. I already did a lot of other cleanup, so I think it's just this and a little more README updates and then I'm ready to publish 0.6. I don't want to spend a lot more time on this. But I wanted to give you a heads up in case you wanted to make any more edits before the release!