lpsmith / postgresql-simple

Mid-level client library for accessing PostgreSQL from Haskell
Other
206 stars 71 forks source link

`In []` is turned into "IN (null)" which does not have the expected behavior #182

Closed ghost closed 8 years ago

ghost commented 8 years ago

in SQL, "IN (null)" will always evaluate to null. This is different from being a member of an empty list. Instead, In [] should be translated into False.

lpsmith commented 8 years ago

Actually, somebody was complaining about a similar issue a month or so ago via email. It works mostly as expected in the most simple use case against most sane schemas, but yeah. Their complaint was that SELECT * FROM foo WHERE id NOT IN (null) doesn't work as expected.

Unfortunately the culture surrounding SQL and to a lesser extent PostgreSQL seems to not care and occasionally be outright hostile to handling base cases in a sane, consistent way, which tends to cause trouble up the stack.

I'm not aware of any decent way to fix the IN newtype, which was inherited from MySQL, not without getting much deeper into dynamic sql and breaking backward compatibility. The problem is that the query above would be written as ... id NOT IN ? in postgresql-simple, and the question mark expands to (null). However, you have to replace the whole id NOT IN (...) syntax fragment with false, not just the (...), which means no matter how it's achieved, your program will have to have a slightly deeper understanding of what it's generating if you insist on your solution.

I do know of a solution for this issue though: use the Values type, which will help generate syntax like this: SELECT * FROM foo WHERE id NOT IN (VALUES (null::"int4") LIMIT 0). The one caveat though is that you have to specify a type in order for the Values type to work in the empty case, and I've not found any way to avoid that. For example:

$ psql

lpsmith=> select * from fruits where id not in (values (null) limit 0);
ERROR:  operator does not exist: integer = text

lpsmith=> select * from fruits where id not in (select unnest('{}'));
ERROR:  could not determine polymorphic type because input has type "unknown"

Both of these work if you specify the type. So no matter what I've tried, postgresql seems to be highly insistent that you know and state the concrete, ground type of an empty table.

ghost commented 8 years ago

Ah yes. I can see how this is a rat's nest to deal with. Is there documentation of this corner case in the haddocks? It was quite surprising but understandable after the fact. My query that failed was "delete from ... where foo NOT IN (null)". Nothing gets deleted!

On Wed, Apr 27, 2016 at 8:26 AM, Leon P Smith notifications@github.com wrote:

Actually, somebody was complaining about a similar issue a month or so ago via email. It works mostly as expected in the most simple use case against most sane schemas, but yeah. Their complaint was that SELECT * FROM foo WHERE id NOT IN (null) doesn't work as expected.

Unfortunately the culture surrounding SQL and to a lesser extent PostgreSQL seems to not care and occasionally be outright hostile to handling base cases in a sane, consistent way, which tends to cause trouble up the stack.

I'm not aware of to be any decent way to fix the IN newtype, not without getting much deeper into dynamic sql and breaking backward compatibility in a simple way. The problem is that the query above would be written as ... id NOT IN ? in postgresql-simple, and the question mark expands to (null). However, you have to replace the whole id NOT IN (...) syntax fragment with false, not just the (...), which means no matter how it's achieved, your program will have to have a slightly deeper understanding of what it's generating if you insist on your solution.

I do know of a solution for this issue though: use the Values type, which will help generate syntax like this: SELECT * FROM foo WHERE id NOT IN (VALUES (null::"_int8") LIMIT 0). The one main caveat though is that you have to specify a type in order for the Values type to work, and I've not found any way to avoid that. For example:

$ psql

lpsmith=> select * from fruits where id not in (values (null) limit 0); ERROR: operator does not exist: integer = text

lpsmith=> select * from fruits where id not in (select unnest('{}')); ERROR: could not determine polymorphic type because input has type "unknown"

So no matter what I've tried, postgresql seems to be highly insistent on knowing the type of the empty table.

— You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub https://github.com/lpsmith/postgresql-simple/issues/182#issuecomment-215066444

lpsmith commented 8 years ago

Aha, I just came up with a way, but it's rediculously verbose and kind of sillly...

lpsmith=> select * from fruits where id not in (select id from fruits where id in (null));
 id |   name    
----+-----------
  0 | Apple
  2 | Orange
  3 | Banana
  4 | Pineapple
  1 | Pear
(5 rows)

So here you don't have to specify the type, but you do have to repeat the table and column name. Similarly, the values construct would generate this, which also works as expected:

lpsmith=> select * from fruits where id not in (values (null ::"int4") limit 0);
[...]
(5 rows)

Though, hilariously, you don't have to specify the type if you do this on the name column:

lpsmith=> select * from fruits where name not in (values (null) limit 0);
[..]
(5 rows)

So, pick your poison I guess. Would you rather repeat part of your query, or would you prefer to explicitly specify some of the types involved?

lpsmith commented 8 years ago

Ok, one more solution, probably the best yet (at least for the NOT IN issue specifically):

lpsmith=> select * from fruits where COALESCE (id not in (null), TRUE);
 id |   name    
----+-----------
  0 | Apple
  2 | Orange
  3 | Banana
  4 | Pineapple
  1 | Pear
(5 rows)

In any case, this is definitely worth a mention in the haddocks.

simonmichael commented 8 years ago

+1 for a mention in the docs.

lpsmith commented 8 years ago

Ok, anybody want to look over my patch quick before I release?

(and it won't be 0.5.1.2 anymore, sniff sniff. ;-) )

ghost commented 8 years ago

Looks really thorough and informative!

simonmichael commented 8 years ago

+1, nice docs.

lpsmith commented 8 years ago

Ugh, maybe the coalesce idea isn't such a good idea, at least in the positive case:

$ psql
psql (9.5.2, server 9.4.7)

lpsmith=> explain analyze select * from bigger_table where id in (155345, 32645);
                                                                         QUERY PLAN                                                                         
------------------------------------------------------------------------------------------------------------------------------------------------------------
 Index Scan using bigger_table_idx on bigger_table  (cost=0.42..14496.48 rows=4 width=368) (actual time=0.390..35.627 rows=4 loops=1)
   Index Cond: (id = ANY ('{155345,32645}'::bigint[]))
 Planning time: 0.101 ms
 Execution time: 35.667 ms
(4 rows)

lpsmith=> explain analyze select * from bigger_table where coalesce(id in (155345, 32645), true);
                                                       QUERY PLAN                                                       
------------------------------------------------------------------------------------------------------------------------
 Seq Scan on bigger_table  (cost=0.00..24368.21 rows=189648 width=368) (actual time=2.868..186.720 rows=4 loops=1)
   Filter: COALESCE((id = ANY ('{155345,32645}'::bigint[])), true)
   Rows Removed by Filter: 380901
 Planning time: 0.149 ms
 Execution time: 186.762 ms
(5 rows)

I'm not too surprised that the postgresql planner isn't intelligent enough to see inside a coalesce operator, but it seems likely that many (most?) use cases of NOT IN are going to end up as a check condition at the end anyway, as I'm not aware of any indexes that really support the NOT IN operator.