Open kszlim opened 6 days ago
1) Are you using over statements? Any of your over statements has column with just singular value? 2) Are you using concat somewhere? (lazy after concat can behave like this) 3) Are you using filter with some aggregation functions?
Because I'd same issues in such cases
1) Yes, I definitely have some window functions 2) Don't believe so 3) Don't believe I have any aggregations
1) Not sure about this 2) yeah, I think that's likely happening 3) happens even without a collect_all
So probably 1 and 2 can be point on which you can build example
Just check your pipeline/data for this two types of entries and try to reduce it only to them
@ritchie46 still haven't been able to create a repro, but it seems to manifest (in my very complicated query) when I have more than N columns (as opposed to having any particular column cause the issue). Seems like it's also related to having some struct based columns/expressions in the query plan too (as if i remove all of those, it no longer seems to manifest).
So when running my queries, I get this weirdness:
ldf.select((cs.categorical() | cs.string())).head().collect() # This works fine
ldf.select(~(cs.categorical() | cs.string())).head().collect() # This works fine
ldf.select(cs.string()).head().collect() # This works fine
ldf.select(cs.float()).head().collect() # This works fine
ldf.select((cs.string() | cs.float())).head().collect() # This fails
I'll try to bisect the offending commit between 1.5.0 and 1.4.1. Sounds like some sort of overflow issue.
Did a git bisect, and it looks like the regression occurs here: https://github.com/pola-rs/polars/pull/18156
Checks
Reproducible example
panic.txt
Log output
Issue description
When upgrading from 1.4.1 -> 1.5.0 (I've also tested on 1.6.0 and main and they both exhibit this behavior). I get this panic when running a moderately complex query. Will try to produce a MRE.
Expected behavior
No panic
Installed versions