Open AroneyS opened 1 year ago
And if you replace collect
with sink_parquet
?
I still get memory allocation errors with sink_parquet, but it looks like there are now two requests that fail? One for 937,500 bytes and one for 1,093,750 bytes.
Actually I get those memory request with streaming collect as well now. I guess its machine state dependent?
memory allocation of memory allocation of memory allocation of memory allocation of memory allocation of 937500 bytes failed
1093750 bytes failed
memory allocation of 1093750 bytes failed
1093750 bytes failed
9375001093750 bytes failed
bytes failed
memory allocation of 1093750 bytes failed
Aborted (core dumped)
Polars version checks
[X] I have checked that this issue has not already been reported.
[X] I have confirmed this bug exists on the latest version of Polars.
Issue description
memory allocation of 15277786 bytes failed
when running lazy streaming pipeline. Much more RAM available on the machine (1TB total) than requested.I came across this bug trying to groupby a column and aggregate a list[str] column. It was also failing with a similar error when I did groupby/agg with
pl.col("b").flatten()
. If instead I aggregated bypl.col("b")
, it works fine but produces a list[list[str]] column. Not sure how to flatten that to list[str], except by below, which also gives the memory error.Reproducible example
Expected behavior
Memory allocation succeeds. Above example works on my machine with 10**6 rows.
Installed versions