Closed vaguue closed 3 months ago
seems related to #412
Can you say what you tried? Usually an error like
Cannot read properties of undefined (reading '__wbindgen_add_to_stack_pointer')
means that you didn't initialize the Wasm bundle. If you're using the esm
endpoint, you need to await
the default export, otherwise the Wasm bundle will never get initialized.
Tried to run it with node v21.5.0 in esm mode (with "type": "module"
)
In esm mode, you always have to await the default export, or you'll get errors like above where the wasm wasn't instantiated
@kylebarron Would you accept a PR that updates the documentation? I also ran into this exact issue when integrating parquet-wasm into an ESM web worker (next.js). I think this would be very helpful given that e.g. also vite defaults to esm modules with v5.
Isn't esm async out-of-box? I always thought that the whole meaning of esm is the possibility to export somewhat asynchronous, yet I have to do await somethingImported
? Kinda counterintuitive
Would you accept a PR that updates the documentation?
Yes of course! PRs always welcome
Isn't esm async out-of-box?
It is but wasm initialization is a separate async step from just loading the code itself.
Ideally we can fix https://github.com/kylebarron/parquet-wasm/pull/414 and then publish an 0.6 release sometime soon, but I haven't had time to test that.
Well, can't wait for this to happen, as of now I had to use apache arrow + node-addon-api, to it would be nice to have a stable API for working with parquets. What we gonna do with this issue?
well, can we just create an export in which we await this default export
and reexport the actual module?
So we can just import the module and be ready to go. Because generally this await init();
thing is kinda dubious for me.
well, can we just create an export in which we await this
default export
and reexport the actual module?
No, as far as I can tell that's not possible. And even if it were, I'd have to somehow modify the default JS binding that wasm-bindgen emits, which sounds horrible.
thing is kinda dubious for me
How is this dubious?
import initWasm, {readParquet} from 'parquet-wasm/esm/arrow1.js';
await initWasm();
readParquet(...);
FWIW sql.js has the same behavior, which they call initSqlJs
, so I'm not alone.
A PR is welcome to improve the docs! But otherwise I'm going to close this because it's expected behavior.
what about doing like myexport.js:
import initWasm, * as MyExports from 'parquet-wasm/esm/arrow1.js';
await initWasm();
export * from MyExports;
what I mean is why not just create a wrapper around the default wasm-bindgen intricacies to make the usage more simple :)
I don't know how wasm-bindgen
guys see things, but in my opinion that's kinda against the ESM nature at all.
Not that I see a possible case when someone imports the module but doesn't await for this init
thing.
The wasm bundle is not fetched until the initWasm
call. Therefore, separating it gives a lot more power to users. For example, you might only rarely fetch Parquet files from your app, and therefore wish to defer loading the wasm until the end user needs the functionality.
Additionally, you can pass a URL into initWasm
to fetch the wasm from your own server, which can be necessary in some situations.
Correct me if I'm wrong, but in this case one can just import the whole module asynchronously,i.e. await import(...). So you have this "power" even without the init step. But this step overcomplicates Node.js usage.
import * as arrow from "apache-arrow";
import init, * as parquet from "parquet-wasm";
await init();
// Create Arrow Table in JS
const LENGTH = 2000;
const rainAmounts = Float32Array.from({ length: LENGTH }, () =>
Number((Math.random() * 20).toFixed(1))
);
const rainDates = Array.from(
{ length: LENGTH },
(_, i) => new Date(Date.now() - 1000 * 60 * 60 * 24 * i)
);
const rainfall = arrow.tableFromArrays({
precipitation: rainAmounts,
date: rainDates,
});
// Write Arrow Table to Parquet
// wasmTable is an Arrow table in WebAssembly memory
const wasmTable = parquet.Table.fromIPCStream(arrow.tableToIPC(rainfall, "stream"));
const writerProperties = new parquet.WriterPropertiesBuilder()
.setCompression(parquet.Compression.ZSTD)
.build();
const parquetUint8Array = parquet.writeParquet(wasmTable, writerProperties);
// Read Parquet buffer back to Arrow Table
// arrowWasmTable is an Arrow table in WebAssembly memory
const arrowWasmTable = parquet.readParquet(parquetUint8Array);
// table is now an Arrow table in JS memory
const table = arrow.tableFromIPC(arrowWasmTable.intoIPCStream());
console.log(table.schema.toString());
// Schema<{ 0: precipitation: Float32, 1: date: Date64<MILLISECOND> }>
node:internal/deps/undici/undici:12442
Error.captureStackTrace(err, this);
^
TypeError: fetch failed
at node:internal/deps/undici/undici:12442:11
at async __wbg_init (file:///Users/seva/seva/node_modules/parquet-wasm/esm/parquet_wasm.js:5238:51)
at async file:///Users/seva/seva/boosters/check.js:4:1 {
cause: Error: not implemented... yet...
at makeNetworkError (node:internal/deps/undici/undici:5675:35)
at schemeFetch (node:internal/deps/undici/undici:10563:34)
at node:internal/deps/undici/undici:10440:26
at mainFetch (node:internal/deps/undici/undici:10459:11)
at fetching (node:internal/deps/undici/undici:10407:7)
at fetch (node:internal/deps/undici/undici:10271:20)
at Object.fetch (node:internal/deps/undici/undici:12441:10)
at fetch (node:internal/process/pre_execution:336:27)
at __wbg_init (file:///Users/seva/seva/node_modules/parquet-wasm/esm/parquet_wasm.js:5233:17)
at file:///Users/seva/seva/boosters/check.js:4:7
}
Node.js v21.5.0
This is just terrible
If you're in node, use the node export
I'm following the steps from README.md and getting this error