kylebarron / parquet-wasm

Rust-based WebAssembly bindings to read and write Apache Parquet data
https://kylebarron.dev/parquet-wasm/
Apache License 2.0
481 stars 19 forks source link

Fully empty file does not load #511

Closed isaacbrodsky closed 2 months ago

isaacbrodsky commented 2 months ago
import pandas as pd
pd.DataFrame().to_parquet('empty.parquet')

I tried to load this in parquet-wasm and I got "unreachable executed". Having zero rows but some columns did work.

kylebarron commented 2 months ago

In the latest main I can't reproduce this. Using the test case added in https://github.com/kylebarron/parquet-wasm/pull/512 (which uses the same line of pandas code) I can see the table schema as such:

stdout | tests/js/index.test.ts > reads empty file
empty table schema Schema {
  fields: [],
  metadata: Map(1) {
    'pandas' => '{"index_columns": [{"kind": "range", "name": null, "start": 0, "stop": 0, "step": 1}], "column_indexes": [{"name": null, "field_name": null, "pandas_type": "int64", "numpy_type": "int64", "metadata": null}], "columns": [], "creator": {"library": "pyarrow", "version": "13.0.0"}, "pandas_version": "2.1.1"}'
  },
  dictionaries: Map(0) {},
  metadataVersion: 4
}

Maybe there's a difference in pandas/pyarrow versions? My pandas version is 2.1.1 and my pyarrow version here is 13.0.0.

kylebarron commented 2 months ago

Based on https://github.com/kylebarron/parquet-wasm/pull/512 I'll close this, but feel free to reopen if you have another reproducible example

isaacbrodsky commented 2 months ago

Interesting! I'll keep on the lookout for it. Thanks for adding the test.

isaacbrodsky commented 2 months ago

I tested again with the latest parquet-wasm and it doesn't seem to be an issue anymore

kylebarron commented 2 months ago

Great to hear