Open niccolopetti opened 9 months ago
The query is not supported in FULL streaming mode. Only partiually.
The query is not supported in FULL streaming mode. Only partiually.
thanks for the answer, any workarounds to accomplish it? Is it planned to be supported in full streaming mode?
for my real use case I was thinking of doing this not in streaming mode, with a select with only that value and the primary key, then process the other columns in streaming fashion and then just doing a join between two .parquet files in streaming mode, any better solutions?
The query is not supported in FULL streaming mode. Only partiually.
thanks for the answer, any workarounds to accomplish it? Is it planned to be supported in full streaming mode?
for my real use case I was thinking of doing this not in streaming mode, with a select with only that value and the primary key, then process the other columns in streaming fashion and then just doing a join between two .parquet files in streaming mode, any better solutions?
update, also the solution I had thought which was based on precomputing the column b not on streaming fashion and then do
pl.concat([
pl.LazyFrame( {'a': [1, 3, 8]}),
pl.LazyFrame( {'b': [2, 4, 9]})], how="horizontal"
).sink_parquet("a.parquet")
doesn't work, giving always
InvalidOperationError: sink_Parquet(ParquetWriteOptions { compression: Zstd(None), statistics: false, row_group_size: None, data_pagesize_limit: None, maintain_order: true }) not yet supported in standard engine. Use 'collect().write_parquet()'
any ideas how to do this? @ritchie46
Checks
Reproducible example
Log output
No response
Issue description
pl.col().diff() causes error when trying to .sink_parquet(), however the query is supported in streaming mode, below is the execution graph: this issue resembles #9337 and #9740 but can't be solved via
Expected behavior
Running the same query in streaming mode calling collect() works, so I would expect sink_parquet to work too:
this produces the expected output:
Installed versions