kylebarron / parquet-wasm

Rust-based WebAssembly bindings to read and write Apache Parquet data
https://kylebarron.dev/parquet-wasm/
Apache License 2.0
481 stars 19 forks source link

Crash on .free() #522

Open drewbitt opened 2 months ago

drewbitt commented 2 months ago

Reproduction

import { tableFromJSON, tableToIPC } from "apache-arrow";
import * as Parquet from "parquet-wasm";

// Sample data
const testData = [
  { id: 1, name: "John" },
  { id: 2, name: "Jane" },
];

// Create an Arrow table from the test data
const arrowTable = tableFromJSON(testData);
console.log(arrowTable);

// Create a Parquet Table from the Arrow table
const wasmTable = Parquet.Table.fromIPCStream(tableToIPC(arrowTable, "stream"));
console.log(wasmTable);

// Write the Parquet table to a buffer
const writerProperties = new Parquet.WriterPropertiesBuilder().build();
const parquetData = Parquet.writeParquet(wasmTable, writerProperties);

// Attempt to free the Parquet Table
wasmTable.free();
Output ``` tsx json-parquet-2.ts Table { schema: Schema { fields: [ [Field], [Field] ], metadata: Map(0) {}, dictionaries: Map(1) { 0 => [Utf8] }, metadataVersion: 4 }, batches: [ RecordBatch { schema: [Schema], data: [Data] } ], _offsets: Uint32Array(2) [ 0, 2 ] } Table { __wbg_ptr: 2369000 } /Users/drewbitt/Repos/Pantomath/benchmarking/node_modules/parquet-wasm/node/parquet_wasm.js:3359 throw new Error(getStringFromWasm0(arg0, arg1)); ^ Error: null pointer passed to rust at module.exports.__wbindgen_throw (/Users/drewbitt/Repos/x/benchmarking/node_modules/parquet-wasm/node/parquet_wasm.js:3359:11) at wasm://wasm/014c002a:wasm-function[6573]:0x405d03 at wasm://wasm/014c002a:wasm-function[6574]:0x405d10 at wasm://wasm/014c002a:wasm-function[3297]:0x3a06de at wasm://wasm/014c002a:wasm-function[4074]:0x3c8bd1 at Table.free (/Users/drewbitt/Repos/x/benchmarking/node_modules/parquet-wasm/node/parquet_wasm.js:2095:14) at (/Users/drewbitt/Repos/x/benchmarking/json-parquet-2.ts:23:11) at Object. (/Users/drewbitt/Repos/x/benchmarking/json-parquet-2.ts:23:16) at Module._compile (node:internal/modules/cjs/loader:1376:14) at Object.S (/Users/drewbitt/.local/share/mise/installs/npm-tsx/4.7.1/lib/node_modules/tsx/dist/cjs/index.cjs:1:1292) Node.js v20.11.0 ```

I'm not very well aligned in this space, so let me know if this is expected for some reason. Thanks!

kylebarron commented 2 months ago

Yeah... this part can be confusing. The tl;dr is that writeParquet frees the table itself. We should probably clarify this in the function's docstring

Functions exported from rust through wasm-bindgen can either take inputs by reference or by value, and the latter consumes the input object. Here, writeParquet takes the input table by value, and so consumes its data.

You can always check the __wbg_ptr property of a wasm object to check whether the data has been freed or not. If the pointer is 0, it's a null pointer and the data has already been freed.

> let wasm = require('parquet-wasm/node')
> let properties = new wasm.WriterPropertiesBuilder().build()
undefined
> properties.__wbg_ptr
2621480
> properties.free()
undefined
> properties.__wbg_ptr
0
drewbitt commented 2 months ago

Thank you! That was helpful

I think adding that to the docstring and not erroring when this happens - stopping all execution - would be nice to have. A console.warn would be more suitable.

kylebarron commented 2 months ago

not erroring when this happens - stopping all execution - would be nice to have

That's not something I can control. That's part of the auto-generated bindings by rust's wasm-bindgen.

kylebarron commented 2 months ago

Let's keep this open as a reminder to improve the documentation here