Closed theelderbeever closed 1 month ago
Additionally, the following structure can be toggled between two separate errors occurring.
"has_more": False,
PanicException: the offset of the new Buffer cannot exceed the existing length
# "has_more": False,
ComputeError: parquet: File out of specification: The max_value of statistics MUST be plain encoded
pl.DataFrame(
[
{
"items": {
"data": [
{
"plan": {
"tiers": [
{
"up_to": None,
}
],
"tiers_mode": "volume",
},
},
{
"plan": {
"tiers": [
{
"up_to": None,
}
],
"tiers_mode": "volume",
},
},
],
"has_more": False, # comment this line to get a buffer size error
}
}
]
).write_parquet("items.parquet")
Having the same error here, with reproduction:
print(pl.__version__)
df = pl.DataFrame([
{
'a': {
'b': [{'c': 'x'}],
'd': 10
}
}
])
print(df.dtypes)
df.write_parquet('/tmp/a.parquet')
Checks
Reproducible example
Log output
Issue description
Polars default parquet engine fails with a metadata statistics error which does not occur with
use_pyarrow=True
.Expected behavior
Polars parquet writer's should both be able to write the same dataframe.
Installed versions