Generating a file with a single map column using the python snippet below causes parquet_read to fail on the latest version of the main branch (653d57ad3)
import pyarrow as pa
import pyarrow.parquet
print(f"pyarrow {pa.__version__}")
# pyarrow 11.0.0
table = pa.Table.from_pydict({"my_column": pa.array([{"foo": 123}, {"foo": 321}], pa.map_(pa.string(), pa.uint64()))})
with open("sample.parquet", "wb") as f:
pa.parquet.write_table(table=table, where=f, version="2.6", data_page_version="2.0", compression="SNAPPY")
Attempting to read it yields:
$ RUST_BACKTRACE=1 cargo run --features io_parquet,io_parquet_compression --example parquet_read sample.parquet
Compiling arrow2 v0.16.0 (/home/kjschiroo/Desktop/arrow2)
Finished dev [unoptimized + debuginfo] target(s) in 15.78s
Running `target/debug/examples/parquet_read sample.parquet`
Statistics {
null_count: MapArray[[{key: 0, value: 0}]],
distinct_count: MapArray[[{key: None, value: None}]],
min_value: MapArray[[{key: foo, value: 123}]],
max_value: MapArray[[{key: foo, value: 321}]],
}
thread 'main' panicked at 'called `Option::unwrap()` on a `None` value', src/io/parquet/read/row_group.rs:69:37
stack backtrace:
0: rust_begin_unwind
at /rustc/53e4b9dd74c29cc9308b8d0f10facac70bb101a7/library/std/src/panicking.rs:575:5
1: core::panicking::panic_fmt
at /rustc/53e4b9dd74c29cc9308b8d0f10facac70bb101a7/library/core/src/panicking.rs:64:14
2: core::panicking::panic
at /rustc/53e4b9dd74c29cc9308b8d0f10facac70bb101a7/library/core/src/panicking.rs:111:5
3: core::option::Option<T>::unwrap
at /rustc/53e4b9dd74c29cc9308b8d0f10facac70bb101a7/library/core/src/option.rs:778:21
4: <arrow2::io::parquet::read::row_group::RowGroupDeserializer as core::iter::traits::iterator::Iterator>::next::{{closure}}
at ./src/io/parquet/read/row_group.rs:69:25
5: core::iter::adapters::map::map_try_fold::{{closure}}
at /rustc/53e4b9dd74c29cc9308b8d0f10facac70bb101a7/library/core/src/iter/adapters/map.rs:91:28
6: core::iter::traits::iterator::Iterator::try_fold
at /rustc/53e4b9dd74c29cc9308b8d0f10facac70bb101a7/library/core/src/iter/traits/iterator.rs:2238:21
7: <core::iter::adapters::map::Map<I,F> as core::iter::traits::iterator::Iterator>::try_fold
at /rustc/53e4b9dd74c29cc9308b8d0f10facac70bb101a7/library/core/src/iter/adapters/map.rs:117:9
8: <core::iter::adapters::GenericShunt<I,R> as core::iter::traits::iterator::Iterator>::try_fold
at /rustc/53e4b9dd74c29cc9308b8d0f10facac70bb101a7/library/core/src/iter/adapters/mod.rs:195:9
9: core::iter::traits::iterator::Iterator::try_for_each
at /rustc/53e4b9dd74c29cc9308b8d0f10facac70bb101a7/library/core/src/iter/traits/iterator.rs:2299:9
10: <core::iter::adapters::GenericShunt<I,R> as core::iter::traits::iterator::Iterator>::next
at /rustc/53e4b9dd74c29cc9308b8d0f10facac70bb101a7/library/core/src/iter/adapters/mod.rs:178:9
11: <alloc::vec::Vec<T> as alloc::vec::spec_from_iter_nested::SpecFromIterNested<T,I>>::from_iter
at /rustc/53e4b9dd74c29cc9308b8d0f10facac70bb101a7/library/alloc/src/vec/spec_from_iter_nested.rs:26:32
12: <alloc::vec::Vec<T> as alloc::vec::spec_from_iter::SpecFromIter<T,I>>::from_iter
at /rustc/53e4b9dd74c29cc9308b8d0f10facac70bb101a7/library/alloc/src/vec/spec_from_iter.rs:33:9
13: <alloc::vec::Vec<T> as core::iter::traits::collect::FromIterator<T>>::from_iter
at /rustc/53e4b9dd74c29cc9308b8d0f10facac70bb101a7/library/alloc/src/vec/mod.rs:2748:9
14: core::iter::traits::iterator::Iterator::collect
at /rustc/53e4b9dd74c29cc9308b8d0f10facac70bb101a7/library/core/src/iter/traits/iterator.rs:1836:9
15: <core::result::Result<V,E> as core::iter::traits::collect::FromIterator<core::result::Result<A,E>>>::from_iter::{{closure}}
at /rustc/53e4b9dd74c29cc9308b8d0f10facac70bb101a7/library/core/src/result.rs:2075:49
16: core::iter::adapters::try_process
at /rustc/53e4b9dd74c29cc9308b8d0f10facac70bb101a7/library/core/src/iter/adapters/mod.rs:164:17
17: <core::result::Result<V,E> as core::iter::traits::collect::FromIterator<core::result::Result<A,E>>>::from_iter
at /rustc/53e4b9dd74c29cc9308b8d0f10facac70bb101a7/library/core/src/result.rs:2075:9
18: core::iter::traits::iterator::Iterator::collect
at /rustc/53e4b9dd74c29cc9308b8d0f10facac70bb101a7/library/core/src/iter/traits/iterator.rs:1836:9
19: <arrow2::io::parquet::read::row_group::RowGroupDeserializer as core::iter::traits::iterator::Iterator>::next
at ./src/io/parquet/read/row_group.rs:66:21
20: <arrow2::io::parquet::read::file::FileReader<R> as core::iter::traits::iterator::Iterator>::next
at ./src/io/parquet/read/file.rs:77:19
21: parquet_read::main
at ./examples/parquet_read.rs:42:24
22: core::ops::function::FnOnce::call_once
at /rustc/53e4b9dd74c29cc9308b8d0f10facac70bb101a7/library/core/src/ops/function.rs:507:5
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
So far as I've been able to debug the length of the map array as determined here (https://github.com/jorgecarleitao/arrow2/blob/main/src/array/map/mod.rs#L157) is coming back as 1, when expectations and a debug level print of the map array indicate it should have a length of two. Beyond that we're getting into the Offsets object which I'm not yet certain how to conceptualize.
Generating a file with a single map column using the python snippet below causes
parquet_read
to fail on the latest version of the main branch (653d57ad3)Attempting to read it yields:
So far as I've been able to debug the length of the map array as determined here (https://github.com/jorgecarleitao/arrow2/blob/main/src/array/map/mod.rs#L157) is coming back as 1, when expectations and a debug level print of the map array indicate it should have a length of two. Beyond that we're getting into the Offsets object which I'm not yet certain how to conceptualize.