ibis-project / ibis

the portable Python dataframe library
https://ibis-project.org
Apache License 2.0
5.33k stars 599 forks source link

fix(snowflake): make semantics of array filtering match everything else #10469

Closed cpcloud closed 1 week ago

cpcloud commented 1 week ago

Fixes snowflake failures on main, caused by slightly different behavior around NULLs used in array filtering.

This PR ensures that behavior is the same for all backends that support array filtering with an index.

Snowflake is a bit more complex because its higher order functions only accept a single argument, so before we compile we run a rewrite rule (the use of a rewrite is not new in this PR) to extract the field being referenced in the body of the function.

The changes here are only relevant for the case of array filtering with an index. If an index isn't used, then the behavior is the same as before.

This changes here are effectively making the snowflake backend implement array filtering with an index like Trino and PySpark, where we construct a struct containing whether to keep a value or not, along with the value.

This allows preservation of NULL values when the index is used for filtering, for example.

cpcloud commented 1 week ago

Snowflake is passing:

…/ibis on  null-handling-array-filter-clouds is 📦 v9.5.0 via 🐍 v3.12.7 via ❄️  impure (ibis-3.12-env)
❯ pytest -m snowflake -n 8 --dist loadgroup --snapshot-update -q
bringing up nodes...
.x..........................xx..................................s.......................................................... [  6%]
....................................................x.......................x...........................x...x........x..... [ 12%]
.....x......................x..............................................x...x....x.x.x.......x..x...x....x..xx..x....x.. [ 18%]
...............x...................x.............x.......x.......x.................x....x.................................. [ 24%]
..xx...x...............x........x......x...................x...x.........................x................................. [ 30%]
....................................................x.............x.......................x................................ [ 36%]
....................................x.x..xx..x..xxxx...xx.xx....xx.xx....x....xxx....x..x.x.xx...x.x.x..x.xx.....x......x.x [ 42%]
..xxx..xxx..xxx..xxxx.xxx.x...xxx..x...xxx.x......x.......xx.xx.........x....x....x..x...............x.......x...........x. [ 48%]
.............x.....x................x..............x...x............x............xx.......x...............x................ [ 54%]
......................................x...xx....x.....s.......................x..x.xxxxx.xxxx.xx.....xx....x..x.xx.......x. [ 60%]
.x..x....xx....xx..x....x...x...x........xx...x......xx..............x.x.x...x..xx....x........x...xx.x............xxx..... [ 66%]
.......xx..x.....x........x.........xxxxxxxxxxx..s........x.......................................s..x..............s...... [ 72%]
........................................x.......x..x....x....x....x..x......................................x.............. [ 78%]
x..x.........x.....x..x..x...............xx...x...x.....x...x....x.xx..............x...................x............x..x.x. [ 84%]
.........s........................ss....x.s.............x.................................................................. [ 90%]
..........x..............................................s................................................................. [ 97%]
............................................................                                                                                                                                  [100%]
1792 passed, 10 skipped, 226 xfailed in 251.20s (0:04:11)
cpcloud commented 1 week ago

Snowflake passing after most recent force push:

…/ibis on  null-handling-array-filter-clouds is 📦 v9.5.0 via 🐍 v3.12.7 via ❄️  impure (ibis-3.12-env)
❯ pytest -m snowflake -n 8 --dist loadgroup --snapshot-update -q
bringing up nodes...
.....................................................x......x....................x....xx......x..................x...........x........x....x....x....................x........x.............. [  9%]
..x.......x.......x...........x.x....x..x....x......xx..x...x.........x.x.xx.xxx..x.....x..xx......xx.x.x.xxx...xxx.x.x.xxx...xxx...x.xx....x.x.x..x..x...xxx.x.x.xx.x...xx.xx..xxx.xx..xxxx. [ 18%]
....xxx....x.xx......xxx.....x....x.x...xx.....x............xx..........................................................................x.......x...................x..............x......... [ 27%]
.....x..............................s...........................s.............................................x.....x...s.........x..........x....x.x.....x........x.....x...x..x.......xx... [ 37%]
x.x.......x.....................xxx........x.....x.xx.........x..x.x...xx.x.............x.......x....xx.x..........x.........................xxxx...........x..x................x...x.x...... [ 46%]
.......x............x.........x........x..s......x.................x...x......xxxxxxx..xx.....xx..xx.......x...x...........x.........x............x....x........s.......................x.... [ 55%]
..........x.........................x...x...................x....x...............x....x..............x.....x......x...........x.......x........................x..................x.......xxx [ 65%]
.....................................................................x....s..x............................................................................................................... [ 74%]
...........x.x.........x........................................................................x...xx..............x..................x..x..........................x......x................ [ 83%]
.........x.....x..................x....xxxx....xx..xx..x.xxx.x.x.......x........x..............................................s..s..............................................s........... [ 93%]
...............................................................s....x.....................................................................                                                    [100%]
1792 passed, 10 skipped, 226 xfailed in 261.65s (0:04:21)
cpcloud commented 1 week ago

Merging to get CI green again.