tizoc / ppx_pgsql

Syntax extension for embedded SQL queries using PG'OCaml.
BSD 3-Clause "New" or "Revised" License
52 stars 9 forks source link

Empty list as param generates incorrect sql query at runtime. #5

Open NightBlues opened 5 years ago

NightBlues commented 5 years ago

If we pass empty list, this code https://github.com/tizoc/ppx_pgsql/blob/master/lib/runtime/ppx_pgsql_runtime.ml#L49 generates () that postgres can't understand. For example:

SELECT COUNT(*) FROM users
WHERE id IN (1,2,3)

is correct, but

SELECT COUNT(*) FROM users
WHERE id IN ()

gives syntax error.

NightBlues commented 5 years ago

I can make a patch, but I dont see easy way to fix it - if we substitute an empty array here we have to specify types explicitly because we generate prepared statement instead of real query and Postgres can't infer it, but we don't know params types at this point.

tizoc commented 5 years ago

I can't think of any good solution. Other than what you mention, the alternative is to replace the whole expression <VALUE-EXPRESSION> IN () for false and <VALUE-EXPRESSION> NOT IN () for true, but doing that requires parsing SQL, which makes everything quite a bit more complicated than it is already. And thats not even a good translation because it doesn't behave the same for NULL values as the original expression.

tizoc commented 5 years ago

Thinking a bit more about it, the IN () case can be replaced by IN (NULL) while retaining the expected behaviour, right? It is NOT IN () that gets weird.

@NightBlues thoughts?

NightBlues commented 5 years ago

According to postgres documentation NULL is like a blackhole - everything touched by it becomes NULL too:) 1 IN (NULL) gives NULL, while 1 IN (2,3) gives false, NULL IN (NULL) also gives NULL because NULL is not equal to itself. Seems to be appropriate solution, despite looking like a hack.

tizoc commented 5 years ago

@NightBlues I have an untested alternative that may solve every case in a clean way, while at the same time avoiding the need of having to create a separate prepared statement for each list length (something that is needed with the current implementation). I'm not sure how well it will work in practice because PGOCaml doesn't define arrays for every type IIRC.

The basic idea is:

SELECT COUNT(*) FROM users
WHERE id IN $@ids::int[]

gets translated into

SELECT COUNT(*) FROM users
WHERE id IN (SELECT unnest($1::int[]))

This requires the user to be explicit and cast the value to the required type. Anyway, I have to think about this more, just wanted to share the idea I had.

tizoc commented 4 years ago

@NightBlues by now you probably have this solved already, but I found another solution that seems to work fine:

-- Equivalent to IN (...)
SELECT COUNT(*) FROM users
WHERE id = ANY($user_ids_array::int[])

-- Equivalent to NOT IN (...)
SELECT COUNT(*) FROM users
WHERE id <> ALL($user_ids_array::int[])

This will work with empty arrays too, and has the advantage of requiring a single prepared statement for every list length, while the IN/NOT IN version will create a new prepared statement for each list length.

I will update the documentation with a comment mentioning this (along with other gotchas like nullable columns in views, and you COALESCE trick for outer joins)

NightBlues commented 4 years ago

Modern postgres has very powerfull features, for example sometimes I use conversions to json and back with json operators. :)