tobymao / sqlglot

Python SQL Parser and Transpiler
https://sqlglot.com/
MIT License
6.7k stars 701 forks source link

Overlapping disjunctions are not optimized #3909

Closed rcurtin closed 2 months ago

rcurtin commented 2 months ago

Thanks for writing sqlglot---it is extremely useful! I am not sure whether to call this a bug report because I found DuckDB does not do the optimization either. But I am curious of your thoughts of what this should do:

>>> from sqlglot.optimizer import optimize
>>> optimize("select a from x where a > 5 or a <= 5", { 'x': { 'a': 'int' }}).sql()
'SELECT "x"."a" AS "a" FROM "x" AS "x" WHERE "x"."a" <= 5 OR "x"."a" > 5'

Here the disjunctions fully overlap and the condition could be removed entirely, producing

'SELECT "x"."a" AS "a" FROM "x" AS "x"'

Is this something that should be supported by the optimizer? Or is there a reason that it is not supported? (Perhaps it is just not a common situation? In my case I got here by combining filters off different expressions; I would hope no reasonable SQL user is manually writing queries like what's above!)

tobymao commented 2 months ago

it's not entirely equivalent because of when a is null

rcurtin commented 2 months ago

Thanks for the quick response! I overlooked that case. But even if I specify that as a condition, I still don't get a simplification:

>>> optimize("select a from x where (a > 5 or a <= 5) and a is not null", { 'x': { 'a': 'int' }}).sql()
'SELECT "x"."a" AS "a" FROM "x" AS "x" WHERE ("x"."a" <= 5 OR "x"."a" > 5) AND NOT "x"."a" IS NULL'

Here we should be able to drop the ("x"."a" <= 5 OR "x"."a" > 5). Let me know if I overlooked something there too... it is possible I am insufficiently caffeinated today...

tobymao commented 2 months ago

nope, it's not implemented, and we don't have any plans for it right now

rcurtin commented 2 months ago

If I did implement it, would you accept the contribution? I might need it for my own usage, and happy to send it upstream if you have the time and bandwidth to review it.

tobymao commented 2 months ago

i'm not sure, because it's a tough problem, another issue with this optimization is if x is not an integer.

in general, we try to have optimizations that are guaranteed to be correct