Closed mbostock closed 6 months ago
This is actually an upstream arrow JS bug. Here's a repro case independent of parquet:
const arrow = require('apache-arrow');
const {writeFileSync} = require('fs');
const table = arrow.tableFromArrays({
test: [new Date("2012-01-01T12:34:56.789Z")],
});
const buffer = arrow.tableToIPC(table, 'file')
writeFileSync('table.arrow', buffer)
and then in Python:
import pyarrow.feather as feather
table = feather.read_table('table.arrow')
table.schema
# test: date64[ms] not null
table.to_pandas()
# test
# 0 2012-01-01
Also, if you look at the field info in JS before exporting to Python, you'll also see it's defined as a DateMillisecond
type, which doesn't store any time information.
> table.schema.fields[0]
Field {
name: 'test',
type: DateMillisecond [Date] { unit: 1 },
nullable: false,
metadata: Map(0) {}
}
Closing as I don't think this is related to parquet-wasm, but happy to discuss further
I think this is the same bug as https://github.com/duckdb/duckdb-wasm/issues/1231…
Consider this test case:
The resulting
test
column erroneously contains the value2012-01-01
instead of2012-01-01T12:34:56.789Z
, dropping the associated time information.