jorgecarleitao / arrow2

Transmute-free Rust library to work with the Arrow format
Apache License 2.0
1.06k stars 223 forks source link

Fixed nested boolean offset #1404

Closed ritchie46 closed 1 year ago

ritchie46 commented 1 year ago

Simply ensure we do the same as on primitive arrays. This fixed this panic:

df = pl.Series([[None, True], [False, False], [True, True]]).slice(2, 2).to_frame()
f = io.BytesIO()
df.write_parquet(f)
thread '<unnamed>' panicked at 'the offset of the new Buffer cannot exceed the existing length', /home/ritchie46/.cargo/git/checkouts/arrow2-945af624853845da/f510012/src/array/boolean/mod.rs:174:5
stack backtrace:
   0: rust_begin_unwind
             at /rustc/333ee6c466972185973d5097f8b5fb0f9fb13fa5/library/std/src/panicking.rs:575:5
   1: core::panicking::panic_fmt
             at /rustc/333ee6c466972185973d5097f8b5fb0f9fb13fa5/library/core/src/panicking.rs:64:14
   2: arrow2::array::boolean::BooleanArray::sliced
             at /home/ritchie46/.cargo/git/checkouts/arrow2-945af624853845da/f510012/src/array/mod.rs:410:13
   3: arrow2::io::parquet::write::boolean::nested::array_to_page
             at /home/ritchie46/.cargo/git/checkouts/arrow2-945af624853845da/f510012/src/io/parquet/write/boolean/nested.rs:27:17
   4: arrow2::io::parquet::write::array_to_page_nested
             at /home/ritchie46/.cargo/git/checkouts/arrow2-945af624853845da/f510012/src/io/parquet/write/mod.rs:541:13
   5: arrow2::io::parquet::write::array_to_page
             at /home/ritchie46/.cargo/git/checkouts/arrow2-945af624853845da/f510012/src/io/parquet/write/mod.rs:293:5
   6: arrow2::io::parquet::write::array_to_pages::{{closure}}
             at /home/ritchie46/.cargo/git/checkouts/arrow2-945af624853845da/f510012/src/io/parquet/write/mod.rs:269:13
   7: core::ops::function::impls::<impl core::ops::function::FnOnce<A> for &mut F>::call_once
             at /rustc/333ee6c466972185973d5097f8b5fb0f9fb13fa5/library/core/src/ops/function.rs:310:13
   8: core::option::Option<T>::map
             at /rustc/333ee6c466972185973d5097f8b5fb0f9fb13fa5/library/core/src/option.rs:970:29
   9: <core::iter::adapters::map::Map<I,F> as core::iter::traits::iterator::Iterator>::next
             at /rustc/333ee6c466972185973d5097f8b5fb0f9fb13fa5/library/core/src/iter/adapters/map.rs:103:9
  10: <alloc::boxed::Box<I,A> as core::iter::traits::iterator::Iterator>::next
             at /rustc/333ee6c466972185973d5097f8b5fb0f9fb13fa5/library/alloc/src/boxed.rs:1922:9
  11: <parquet2::write::dyn_iter::DynIter<V> as core::iter::traits::iterator::Iterator>::next
             at /home/ritchie46/.cargo/registry/src/github.com-1ecc6299db9ec823/parquet2-0.17.1/src/write/dyn_iter.rs:13:9
  12: <core::iter::adapters::map::Map<I,F> as core::iter::traits::iterator::Iterator>::next
             at /rustc/333ee6c466972185973d5097f8b5fb0f9fb13fa5/library/core/src/iter/adapters/map.rs:103:9
  13: <parquet2::write::compression::Compressor<I> as fallible_streaming_iterator::FallibleStreamingIterator>::advance
             at /home/ritchie46/.cargo/registry/src/github.com-1ecc6299db9ec823/parquet2-0.17.1/src/write/compression.rs:149:20
  14: <fallible_streaming_iterator::MapErr<I,F> as fallible_streaming_iterator::FallibleStreamingIterator>::advance
             at /home/ritchie46/.cargo/registry/src/github.com-1ecc6299db9ec823/fallible-streaming-iterator-0.1.9/src/lib.rs:684:9
  15: <parquet2::write::dyn_iter::DynStreamingIterator<V,E> as fallible_streaming_iterator::FallibleStreamingIterator>::advance
             at /home/ritchie46/.cargo/registry/src/github.com-1ecc6299db9ec823/parquet2-0.17.1/src/write/dyn_iter.rs:43:9
  16: fallible_streaming_iterator::FallibleStreamingIterator::next
             at /home/ritchie46/.cargo/registry/src/github.com-1ecc6299db9ec823/fallible-streaming-iterator-0.1.9/src/lib.rs:52:9
  17: parquet2::write::column_chunk::write_column_chunk
             at /home/ritchie46/.cargo/registry/src/github.com-1ecc6299db9ec823/parquet2-0.17.1/src/write/column_chunk.rs:45:39
  18: parquet2::write::row_group::write_row_group::{{closure}}
             at /home/ritchie46/.cargo/registry/src/github.com-1ecc6299db9ec823/parquet2-0.17.1/src/write/row_group.rs:100:17
  19: core::iter::adapters::map::map_try_fold::{{closure}}
             at /rustc/333ee6c466972185973d5097f8b5fb0f9fb13fa5/library/core/src/iter/adapters/map.rs:91:28
  20: core::iter::traits::iterator::Iterator::try_fold
             at /rustc/333ee6c466972185973d5097f8b5fb0f9fb13fa5/library/core/src/iter/traits/iterator.rs:2262:21
  21: <core::iter::adapters::map::Map<I,F> as core::iter::traits::iterator::Iterator>::try

So I assume we slice_nested_leaf already on dyn Array?

However I am not sure. So feel free to dismiss this as I can be entirely wrong here.

codecov[bot] commented 1 year ago

Codecov Report

Base: 83.63% // Head: 83.59% // Decreases project coverage by -0.04% :warning:

Coverage data is based on head (eae6698) compared to base (3a8da98). Patch coverage: 78.76% of modified lines in pull request are covered.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #1404 +/- ## ========================================== - Coverage 83.63% 83.59% -0.04% ========================================== Files 373 373 Lines 40284 40278 -6 ========================================== - Hits 33692 33672 -20 - Misses 6592 6606 +14 ``` | [Impacted Files](https://codecov.io/gh/jorgecarleitao/arrow2/pull/1404?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Jorge+Leitao) | Coverage Δ | | |---|---|---| | [src/io/parquet/read/statistics/mod.rs](https://codecov.io/gh/jorgecarleitao/arrow2/pull/1404?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Jorge+Leitao#diff-c3JjL2lvL3BhcnF1ZXQvcmVhZC9zdGF0aXN0aWNzL21vZC5ycw==) | `87.31% <14.28%> (-1.30%)` | :arrow_down: | | [src/io/parquet/read/deserialize/simple.rs](https://codecov.io/gh/jorgecarleitao/arrow2/pull/1404?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Jorge+Leitao#diff-c3JjL2lvL3BhcnF1ZXQvcmVhZC9kZXNlcmlhbGl6ZS9zaW1wbGUucnM=) | `83.03% <82.35%> (-2.07%)` | :arrow_down: | | [src/io/parquet/write/boolean/nested.rs](https://codecov.io/gh/jorgecarleitao/arrow2/pull/1404?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Jorge+Leitao#diff-c3JjL2lvL3BhcnF1ZXQvd3JpdGUvYm9vbGVhbi9uZXN0ZWQucnM=) | `96.42% <100.00%> (+1.55%)` | :arrow_up: | | [src/io/parquet/write/mod.rs](https://codecov.io/gh/jorgecarleitao/arrow2/pull/1404?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Jorge+Leitao#diff-c3JjL2lvL3BhcnF1ZXQvd3JpdGUvbW9kLnJz) | `87.83% <0.00%> (-0.25%)` | :arrow_down: | | [src/io/ipc/read/file.rs](https://codecov.io/gh/jorgecarleitao/arrow2/pull/1404?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Jorge+Leitao#diff-c3JjL2lvL2lwYy9yZWFkL2ZpbGUucnM=) | `96.87% <0.00%> (+0.44%)` | :arrow_up: | Help us with your feedback. Take ten seconds to tell us [how you rate us](https://about.codecov.io/nps?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Jorge+Leitao). Have a feature suggestion? [Share it here.](https://app.codecov.io/gh/feedback/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Jorge+Leitao)

:umbrella: View full report at Codecov.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.

jorgecarleitao commented 1 year ago

Oops, yes, that should not be there - leftover from the previous iteration. Thanks!