Open tokoko opened 2 months ago
/take
Thank you for reporting this bug @tokoko, and for the start of a repro! I am going to investigate how the ibis over-ride escapes this.
@cpcloud What would be the expected behaviour? Would it be to error?
I put together a thorough repro:
import ibis
import pyarrow as pa
orders_data = pa.Table.from_pydict({
"order_id": [1,2,3,4],
"fk_store_id": [1,2,1,3],
"amount": [100,200,150,300]
})
stores_data = pa.Table.from_pydict({
"store_id": [1,2,3],
"name": ["Store A", "Store B", "Store C"]
})
con = ibis.connect("duckdb://")
orders = con.create_table("orders", orders_data)
stores = con.create_table("stores", stores_data)
print("\nCase 1: Direct comparison")
expr = orders.select(
orders["fk_store_id"],
comparison_result=(orders["fk_store_id"] == stores.select("store_id").limit(1))
)
print("\nSQL generated:")
print(ibis.to_sql(expr))
print("\nResult:")
print(expr.execute())
print("\nCase 2: Correct usage with as_scalar()")
expr_correct = orders.select(
orders["fk_store_id"],
comparison_result=(orders["fk_store_id"] == stores.select("store_id").limit(1).as_scalar())
)
print("\nSQL generated:")
print(ibis.to_sql(expr_correct))
print("\nResult:")
print(expr_correct.execute())
# Case 3: What happens with different comparison types?
print("\nCase 3: Different comparison types")
# Column to literal
print("Column to literal:", type(orders["fk_store_id"] == 1))
# Column to Column
print("Column to Column:", type(orders["fk_store_id"] == orders["order_id"]))
# Column to Table
print("Column to Table:", type(orders["fk_store_id"] == stores.select("store_id").limit(1)))
# Column to Scalar
print("Column to Scalar:", type(orders["fk_store_id"] == stores.select("store_id").limit(1).as_scalar()))
This is the output:
Case 1: Direct comparison
SQL generated:
SELECT
"t0"."fk_store_id",
FALSE AS "comparison_result"
FROM "memory"."main"."orders" AS "t0"
Result:
fk_store_id comparison_result
0 1 False
1 2 False
2 1 False
3 3 False
Case 2: Correct usage with as_scalar()
SQL generated:
SELECT
"t0"."fk_store_id",
"t0"."fk_store_id" = (
SELECT
"t1"."store_id"
FROM "memory"."main"."stores" AS "t1"
LIMIT 1
) AS "comparison_result"
FROM "memory"."main"."orders" AS "t0"
Result:
fk_store_id comparison_result
0 1 True
1 2 False
2 1 True
3 3 False
Case 3: Different comparison types
Column to literal: <class 'ibis.expr.types.logical.BooleanColumn'>
Column to Column: <class 'ibis.expr.types.logical.BooleanColumn'>
Column to Table: <class 'bool'>
Column to Scalar: <class 'ibis.expr.types.logical.BooleanColumn'>
It seems only Columns compared with Tables is returning the boolean.
It seems that in _binop
, op_class
is running into a SignatureValidationError
which results to falling back to the default Python ==
.
-> def _binop(op_class: type[ops.Binary], left: ir.Value, right: ir.Value) -> ir.Value:
> /home/anja/git/ibis/ibis/expr/types/core.py(803)_binop()
-> try:
> /home/anja/git/ibis/ibis/expr/types/core.py(804)_binop()
-> node = op_class(left, right)
ibis.common.annotations.SignatureValidationError: Equals(r0 := DatabaseTable: memory.main.orders
order_id int64
fk_store_id int64
amount int64
fk_store_id: r0.fk_store_id, r0 := DatabaseTable: memory.main.stores
store_id int64
name string
r1 := Project[r0]
store_id: r0.store_id
Limit[r1, n=1]) has failed due to the following errors:
`right`: r0 := DatabaseTable: memory.main.stores
store_id int64
name string
r1 := Project[r0]
store_id: r0.store_id
Limit[r1, n=1] is not coercible to a Value
Expected signature: Equals(left: Value, right: Value)
My interpreation is that Ibis tried to create an Equals operation. The Equals operation required both operands to be Value types. It can't automatically coerce a Table to a Value. So Python's ==
is short-circuiting the comparison, and we lose the informative error message. I'll try to understand where the short circuiting is happening.
Edit Probably because of:
> /home/anja/git/ibis/ibis/expr/types/core.py(805)_binop()
-> except (ValidationError, NotImplementedError):
> /home/anja/git/ibis/ibis/expr/types/core.py(806)_binop()
-> return NotImplemented
And by default when there's a NotImplemented
, it defaults to Python's ==
.
@cpcloud Should we be raising ValidationError
instead of converting it into a NotImplemented
?
I think for some cases NotImplemented
might be necessary, but I would try raising the error in the __eq__
and __ne__
cases only and see if anything breaks.
What happened?
Comparing Table and Column objects escapes ibis operator overrides and instead yields a boolean output. This is not necessarily wrong, but can lead to confusing errors when user is trying to use another table in a scalar subquery, but forgets to call
.as_scalar
method on the table. Instead of throwing an error, such a comparison will always be interpreted as a boolean literalfalse
.What version of ibis are you using?
9.5.0
What backend(s) are you using, if any?
No response
Relevant log output
No response
Code of Conduct