Open guilherme-wisdom opened 2 weeks ago
pinging @alexander-beedie here.
@alexander-beedie , potentially simpler repro / root-cause. I know it is not the same error, but kept trying to reduce the issue and got here. If you think it is a separate issue, I will create one
import polars as pl
pl.__version__ # 1.0.0
users = pl.DataFrame({"id": "1"})
user_groups = pl.DataFrame({"user_id": "1"})
pl.sql("""
WITH user_by_email AS (SELECT id FROM users)
SELECT user_groups.user_id AS input_user_id
FROM user_groups
INNER JOIN user_by_email ON user_groups.user_id = user_by_email.id
-- users <--- if you remove this line, you will get an error
-- It is commented out, but it must have to do with global variable discovery
"""
).collect()
# Raises SQLInterfaceError: relation 'users' was not found when `-- users` line is removed
EDIT: separate issue, I will raise one, please ignore
Tidier repro
# removing the email field fixes it
users = pl.DataFrame({"id": "1", "email": "abc"})
user_groups = pl.DataFrame({"id": "1"})
group_group = pl.DataFrame({"id": "1"})
deals = pl.DataFrame({"id": "1"})
ctx = pl.SQLContext({
"users": users,
"user_groups": user_groups,
"group_group": group_group,
"deals": deals,
})
ctx.execute("""
WITH user_by_email AS (SELECT id FROM users),
user_child AS (
SELECT group_group.id
FROM user_groups
-- removing this join fixes it
INNER JOIN user_by_email ON user_groups.id = user_by_email.id
INNER JOIN group_group ON user_groups.id = group_group.id
)
SELECT *
FROM deals
WHERE (
deals.id IN (
SELECT id
FROM users
-- removing the below line changes the error from
-- `PanicException: internal error: entered unreachable code` to
-- `PanicException: called `Option::unwrap()` on a `None` value`
WHERE id = '1'
)
OR deals.id IN (
SELECT DISTINCT user_groups.id
FROM user_groups AS left
-- removing this join also fixes it
INNER JOIN user_by_email ON user_groups.id = user_by_email.id
)
OR deals.id IN (
SELECT DISTINCT user_groups.id
FROM user_groups
-- removing this join also fixes it
INNER JOIN user_child ON user_groups.id = user_child.id
)
)
""",
eager=True
)
# PanicException: internal error: entered unreachable code
And have been still able to get a panic (thought not the same "entered unreachable code" error) with a smaller repro again
a = pl.DataFrame({"id": "1"})
ctx = pl.SQLContext({"a": a})
ctx.execute("""
-- changing this from `SELECT a.id` to `SELECT id` fixes it
-- even though this CTE is never used
-- removing the CTE also fixes the issue
WITH c AS (SELECT a.id FROM a)
SELECT *
FROM a
WHERE id IN (
SELECT id
FROM a
-- removing this join fixes the issue
INNER JOIN a AS a2 ON a.id = a2.id
)
""",
eager=True
)
# PanicException: called `Option::unwrap()` on a `None` value
I did reproduce the unreachable code error, but it happens elsewhere so not entirely sure if it is the same underlying issue.
users = groups = deals = pl.DataFrame({"id": [1]})
pl.sql("""
with
A as ( select id from groups join users using (id) )
select * from deals
where
id in ( select id from A join A as B using (id) )
""").collect()
# thread '<unnamed>' panicked at crates/polars-plan/src/plans/ir/schema.rs:106:24:
# internal error: entered unreachable code
@alexander-beedie I see that in the example above by @cmdlineluser it creates an external context. I really suspect this to be the culprit. Can we use horizontal concat here instead?
@alexander-beedie I see that in the example above by @cmdlineluser it creates an external context. I really suspect this to be the culprit. Can we use horizontal concat here instead?
I'll find out (though I'm on vacation at the moment so can't dig in properly yet - am only able to sneak the time for small PRs at the moment 😉)
Oh yeah.. enjoy!
Checks
Reproducible example
Log output
Issue description
My application is failing on the example provided. I could not pinpoint exactly what is the combination that is causing the issue. The outer query has three OR operands and if I remove the last OR component it works:
So it has something to do with that last component, or how that component interacts with everything else.
I also ran this against the newest Polars 1.0.0 and it also failed.
Expected behavior
The right output is
1 2 3 4 5 6
Installed versions