Open ShahBinoy opened 1 week ago
I don't necessarily see this as a bug. Are there partitioned writers that mix up the columns like this?
I don't necessarily see this as a bug. Are there partitioned writers that mix up the columns like this?
This actually does not even involve partitions, the failure is directly on the columns of the parquet file, I am not even engaging partitions during scan_parquet
call
It even fails when my path is local and I look it up as ~/polars-issue-schema-mismatch/*.parquet
, it still fails.
Reading columnar records, should not be dependent on the order of the column's index
Reading columnar records, should not be dependent on the order of the column's index
In Polars schema's must align. This isn't a bug. But we are investigating support for unaligned reads.
If schemas must align, how is schema evolution handled? In general this is a huge limitation also in my case.
If schemas must align, how is schema evolution handled? In general this is a huge limitation also in my case.
I agree: schema evolution should be handled. The sooner, the better.
Checks
Reproducible example
Same files are read correctly when read from duckdb
Tried with attached files
polars-issue-schema-mismatch.zip
Log output
Issue description
Schema of some files is
payload
,date
,vibId
in a folder Another set of files have schemavibId
,payload
,date
Wild card matching does not take into consideration the schema by names, but rather just positions. Columnar records should be able to match/fetch by column names too and not just position index
Expected behavior
Same records are processed correctly by duckdb via code
Installed versions