Open timspro opened 2 months ago
If you compile with --debug
flag turned on, then you can see the actual Rust error, instead of just RuntimeError: unreachable
.
With the test in https://github.com/kylebarron/parquet-wasm/pull/607, the error is:
stderr | tests/js/index.test.ts > should read IPC stream correctly
panicked at /Users/kyle/.cargo/registry/src/index.crates.io-6f17d22bba15001f/arrow-ipc-53.0.0/src/convert.rs:98:30:
called `Option::unwrap()` on a `None` value
So the rust code is panicking on this line: https://github.com/apache/arrow-rs/blob/5414f1d7c0683c64d69cf721a83c17d677c78a71/arrow-ipc/src/convert.rs#L98
If we load this data in pyarrow, we see:
In [1]: import pyarrow as pa
In [3]: pa.ipc.open_stream("data.arrows").read_all()
Out[3]:
pyarrow.Table
column: list<: double>
child 0, : double
----
column: [[[1,2],[3,4]]]
So the list's inner field does not have a name set. I'm not sure if that's allowed by the spec (it's rare at least). Either the JS IPC writer or the Rust IPC reader is incorrect.
I checked with @jorisvandenbossche and saw that the IPC spec doesn't require a name to be set, so this is an issue on the Rust side. (Though there should be a default name set)
Created https://github.com/apache/arrow-rs/issues/6415. Otherwise, you can work around this by manually setting a field name for any inner lists.
Thanks for the commentary. The type inference done be tableFromArrays()
is passing the empty name: https://github.com/apache/arrow/blob/main/js/src/factories.ts#L153.
I was then able to get around the issue by passing in the List type directly:
import { Field, Int32, List, tableFromArrays, tableToIPC, vectorFromArray } from "apache-arrow"
import { Table } from "parquet-wasm"
const table = tableFromArrays({
column: vectorFromArray(
[[1, 2], [3, 4]],
new List(new Field("_", new Int32())) // fails if "" passed instead
),
})
const ipc = tableToIPC(table, "stream")
Table.fromIPCStream(ipc)
This is a fine workaround for me.
I'm expecting the following code to work but am getting an error "RuntimeError: unreachable" when running in Node.js v20.17.0, thrown by
fromIPCStream()
.I tried changing "stream" to "file" but that didn't work either with the error "Io error: failed to fill whole buffer".
I was able to get other examples working locally that didn't have a list (for example,
column: [1, 2]
andcolumn: [{a: 1}, {a: 2}]
).It does work if using typed arrays:
column: [new Int32Array([1, 2]), new Int32Array([3, 4])]
. So, I do have a workaround. However, I originally wanted to write a list of structs with Int32 values and now will have to do a struct of typed arrays. Perhaps that is what is intended.