[X] I have checked that this issue has not already been reported.
[X] I have confirmed this bug exists on the latest version of Polars.
Reproducible example
for idx, file in tqdm(enumerate(file_list), total=len(file_list)):
csv_df = pl.scan_csv(
file,
has_header=False,
truncate_ragged_lines=True,
schema=OLD_SCHEMA if 'march' in date_directory_name.lower() else None,
ignore_errors=True,
)
frames.append(csv_df)
if (idx > 0) and (idx % chunksize == 0) or (idx == (len(file_list) - 1)):
combined_df = pl.concat(frames, how='vertical', parallel=True)
columns = ['dt_col1', 'dt_col2', 'dt_col3']
combined_df = combined_df.with_columns(
pl.from_epoch(pl.col(col_name), time_unit='ms').alias(col_name)
for col_name in columns
if combined_df.collect_schema().get(col_name) == pl.Int64())
output_filename = output_dir / f'chunk_{chunk_idx}.parquet'
combined_df.sink_parquet(output_filename)
Log output
thread 'polars-7' panicked at crates/polars-parquet/src/arrow/read/deserialize/binary/utils.rs:121:45:
mid > len
stack backtrace:
0: _rust_begin_unwind
1: core::panicking::panic_fmt
2: polars_parquet::arrow::read::deserialize::binary::decoders::deserialize_plain
3: <polars_parquet::arrow::read::deserialize::binview::BinViewDecoder as polars_parquet::arrow::read::deserialize::utils::Decoder>::deserialize_dict
4: polars_parquet::arrow::read::deserialize::simple::page_iter_to_array
5: polars_io::parquet::read::read_impl::column_idx_to_series
6: <core::iter::adapters::GenericShunt<I,R> as core::iter::traits::iterator::Iterator>::next
7: rayon::iter::plumbing::bridge_producer_consumer::helper
8: <rayon_core::job::StackJob<L,F,R> as rayon_core::job::Job>::execute
9: rayon_core::registry::WorkerThread::wait_until_cold
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
33%|█████████████████████████████████████ | 2/6 [21:30<43:00, 645.16s/it]
Traceback (most recent call last):
File "/Developer/ING/PSS_hardware_monitoring/pss/etl.py", line 374, in <module>
Fire(ETL)
File "/.pyenv/versions/3.12.5/envs/pss/lib/python3.12/site-packages/fire/core.py", line 143, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/.pyenv/versions/3.12.5/envs/pss/lib/python3.12/site-packages/fire/core.py", line 477, in _Fire
component, remaining_args = _CallAndUpdateTrace(
^^^^^^^^^^^^^^^^^^^^
File "/.pyenv/versions/3.12.5/envs/pss/lib/python3.12/site-packages/fire/core.py", line 693, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^
File "/Developer/PSS_hardware_monitoring/pss/etl.py", line 295, in raw_parquet_to_processed_parquet
df.sink_parquet(out_file)
File "/.pyenv/versions/3.12.5/envs/pss/lib/python3.12/site-packages/polars/_utils/unstable.py", line 58, in wrapper
return function(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/.pyenv/versions/3.12.5/envs/pss/lib/python3.12/site-packages/polars/lazyframe/frame.py", line 2351, in sink_parquet
return lf.sink_parquet(
^^^^^^^^^^^^^^^^
pyo3_runtime.PanicException: mid > len
Issue description
It's unclear to me what this error means. I'm processing a total of 5 parquet files. This fails with nr. 3. I can read all input files fine with polars normally, so there doesn't seem to be a clear problem there.
Expected behavior
Successfully processes the data and writes to sink location.
Checks
Reproducible example
Log output
Issue description
It's unclear to me what this error means. I'm processing a total of 5 parquet files. This fails with nr. 3. I can read all input files fine with polars normally, so there doesn't seem to be a clear problem there.
Expected behavior
Successfully processes the data and writes to sink location.
Installed versions