[X] I have checked that this issue has not already been reported.
[X] I have confirmed this bug exists on the latest version of Polars.
Reproducible example
import os
import polars as pl
import tempfile
with tempfile.TemporaryDirectory() as tmpdir:
pl.DataFrame({"a": [1,2,3], "b": [3,2,1]}).write_parquet(f"{tmpdir}/0.parquet")
pl.DataFrame({"b": [1,2,3], "a": [3,2,1]}).write_parquet(f"{tmpdir}/1.parquet")
ldf = pl.scan_parquet(f"{tmpdir}/*.parquet")
os.environ["POLARS_FORCE_ASYNC"] = "0"
ldf.collect() # This works
os.environ["POLARS_FORCE_ASYNC"] = "1"
ldf.collect() # This fails
Log output
polars.exceptions.ComputeError: schema of all files in a single scan_parquet must be equal
Expected: Schema:
name: a, data type: Int64
name: b, data type: Int64
Got: Schema:
name: b, data type: Int64
name: a, data type: Int64
Issue description
scan_parquet is sensitive to column ordering in a cloud (async reader) context, but is fine when reading locally
I found that this has is a regression from 0.19.11 -> 0.19.12, and has failed ever since.
Expected behavior
Should work regardless of column ordering if the schema matches
Checks
Reproducible example
Log output
Issue description
scan_parquet
is sensitive to column ordering in a cloud (async reader) context, but is fine when reading locallyI found that this has is a regression from 0.19.11 -> 0.19.12, and has failed ever since.
Expected behavior
Should work regardless of column ordering if the schema matches
Installed versions