piccolo-orm / piccolo

A fast, user friendly ORM and query builder which supports asyncio.
https://piccolo-orm.com/
MIT License
1.45k stars 91 forks source link

Where JSONB column is_null produces incorrect query statement #410

Open theelderbeever opened 2 years ago

theelderbeever commented 2 years ago

python 3.10.1 piccolo: 0.66.0 db: Postgres

When doing a select or otherwise and comparing a JSONB columns to None or .is_null() the query string adds in a python object causing an error in the Postgres prepare statement.

Table.select(Table.id).where(Table.jsonb_column == None).run_sync()
Table.select(Table.id).where(Table.jsonb_column.is_null()).run_sync()

Emitted sql

SELECT "table"."id" FROM table WHERE "table"."jsonb_column" IS NULL'"<piccolo.columns.combination.Undefined object at 0x103eded10>"'

Error

asyncpg.exceptions.PostgresSyntaxError: syntax error at or near "NULL$1"
dantownsend commented 2 years ago

You're right - I've been able to replicate this.

It's pretty strange. I'll look into it.

theelderbeever commented 2 years ago

Not sure if relevant or already addressed by this but when doing an Table.insert(Table(jsonb_column=None)) Piccolo doesn't appear to encode the value as a SQL NULL value which makes NOT NULL and IS NULL comparison not behave as expected. The value will show up as null in the database but it will be a jsonb serialized null instead of a SQL NULL.

theelderbeever commented 2 years ago

Maybe a SQL_NULL delegate so the ORM knows to add it to the query as NULL without quoting it?

theelderbeever commented 2 years ago

Tested with setting the column default to None and that value is appropriately set to a SQL NULL so this only occurs when doing an INSERT with a None.

dantownsend commented 2 years ago

@theelderbeever Yes, you're right - I only discovered that yesterday. I added a fix for it as part of the same PR:

https://github.com/piccolo-orm/piccolo/pull/413

What I was thinking was, if the column is nullable:

If the column isn't nullable then both None and 'null' give JSON null.

What do you think?

theelderbeever commented 2 years ago

With the risk of having too many configurable arguments... I think None always being SQL NULL is a bit more consistent. That way devs can rely on the NOT NULL constraint to reject values.

With that said.

nullable:

not-nullable

Not entirely sure where the coerce_json_null arg lives though.... on the INSERT/UPDATE call probably?

Thoughts?

dantownsend commented 2 years ago

@theelderbeever I think you're right - making None always mean SQL null is more straight forward and easier to understand.

theelderbeever commented 2 years ago

Another option for the coercion could be...

class Table(Table):
   jsonb_col = JSONB(none_as_null=True)`
theelderbeever commented 2 years ago

Also I promise to stop hammering on the json stuff soon but noted this as well...

This works...

await Table.raw(
    """
    INSERT INTO table (jsonb_column)
    VALUES 
        ({})
    """,
    '{"hello": "world"}'
)

This doesn't...

await Table.raw(
    """
    INSERT INTO table (jsonb_column)
    VALUES 
        ( '{"hello": "world"}' )
    """
)

The emitted SQL turns the json into an empty string as shown

 INSERT INTO table (jsonb_column)
    VALUES 
        ('')

which raises an error when run.

InvalidTextRepresentationError: invalid input syntax for type json
DETAIL:  The input string ended unexpectedly.

Has a simple workaround obviously but, just something to take note of.

dantownsend commented 2 years ago

@theelderbeever Ah, ok. Piccolo QueryString uses curly braces as placeholders, so it's seeing {"hello": "world"} as a placeholder instead of an actual value.

Internally Querystring uses Formatter.parse (docs) to deconstruct the string:

Formatter().parse('select * from foo where a = {}')

I can't figure out a way of making it ignore some curly braces. It seems like neither \{ or {{ works, so not sure what the solution is, besides replacing Formatter().parse with a different parser.

theelderbeever commented 2 years ago

Yeah I see what you mean... Not a lot of options there. Shame there isn't some combination of Template and Formatter.

Expounding on the above example when providing multiple values where an int or otherwise follows a jsonb the prepare statement fails in a strange way.

/asyncpg/protocol/prepared_stmt.pyx:197, in asyncpg.protocol.protocol.PreparedStatementState._encode_bind_msg()

DataError: invalid input for query argument $3: 10 (expected str, got int)
await Employees.raw(
    """
    WITH temp as (
        SELECT name, jsonb_data, salary
        FROM (
            VALUES 
                ({}, {}, {})
        ) t (name, jsonb_data, salary)
    )

    UPDATE employees e
    SET
        jsonb_data = t.jsonb_data,
        salary = t.salary
    WHERE
        e.name = t.name
    """,
    'Bob', '{"hello": "world"}', 100_000
)

Strangely, the emitted SQL can be pasted into another client and works fine.

This specifically occurs when the values get passed through the QueryString.

dantownsend commented 2 years ago

@theelderbeever I'm trying to recreate that DataError issue. I made this simple script, but am getting a syntax error for the SQL - can you spot the issue?

from piccolo.table import Table
from piccolo.columns.column_types import Varchar, JSONB, Integer
from piccolo.engine.postgres import PostgresEngine

DB = PostgresEngine({"database": "jsonb_debug"})

class Employees(Table, db=DB):
    name = Varchar()
    jsonb_data = JSONB()
    salary = Integer()

def main():
    Employees.create_table(if_not_exists=True).run_sync()

    Employees.raw(
        """
        WITH temp as (
            SELECT name, jsonb_data, salary
            FROM (
                VALUES
                    ({}, {}, {})
            ) t (name, jsonb_data, salary)
        )

        UPDATE employees e
        SET
            jsonb_data = t.jsonb_data,
            salary = t.salary
        WHERE
            e.name = t.name
        """,
        "Bob",
        '{"hello": "world"}',
        100_000,
    ).run_sync()

if __name__ == '__main__':
    main()

I'll experiment a bit more.

theelderbeever commented 2 years ago

Ah whoops copy-paste-edit error when creating a non project specific example....

This should do the trick. Two edits...

  1. Added FROM temp t after the SET clause.
  2. Also realized that the jsonb column has to be type cast coming from the temp table. So added t.jsonb_data::jsonb
      WITH temp as (
          SELECT name, jsonb_data, salary
          FROM (
              VALUES
                  ({}, {}, {})
          ) t (name, jsonb_data, salary)
      )

      UPDATE employees e
      SET
          jsonb_data = t.jsonb_data::jsonb,
          salary = t.salary
      FROM temp t
      WHERE
          e.name = t.name
dantownsend commented 2 years ago

@theelderbeever I had a look into it a bit more, and I could replicate the error:

# Fails
await Employees.raw(
    """
    WITH temp as (
        SELECT name, jsonb_data, salary
        FROM (
            VALUES
                ({}, {}, {})
        ) t (name, jsonb_data, salary)
    )

    UPDATE employees e
    SET
        jsonb_data = t.jsonb_data::jsonb,
        salary = t.salary
    FROM temp t
    WHERE
        e.name = t.name
    """,
    "Bob",
    '{"hello": "world"}',
    100_000,
)

The error is:

asyncpg.exceptions.DatatypeMismatchError: column "salary" is of type integer but expression is of type text
HINT:  You will need to rewrite or cast the expression.

It seems like the issue is with asyncpg. When I directly passed the query to asyncpg, bypassing Piccolo entirely, I got the same error:

# Fails
await DB._run_in_new_connection(
    """
    WITH temp as (
        SELECT name, jsonb_data, salary
        FROM (
            VALUES
                ($1, $2, $3)
        ) t (name, jsonb_data, salary)
    )

    UPDATE employees e
    SET
        jsonb_data = t.jsonb_data::jsonb,
        salary = t.salary
    FROM temp t
    WHERE
        e.name = t.name
    """,
    [
        "Bob",
        '{"hello": "world"}',
        100_000
    ]
)

How about directly doing an update, rather than using WITH?

await DB._run_in_new_connection(
    """
    UPDATE employees e
    SET
        jsonb_data = $2,
        salary = $3
    WHERE
        e.name = $1
    """,
    [
        "Bob",
        '{"hello": "world"}',
        100_000
    ]
)

I'll create an issue in asyncpg.

dantownsend commented 2 years ago

@theelderbeever I think I know the issue now.

If you do this, then it works OK:

    await Employees.raw(
        """
        WITH temp as (
            SELECT name, jsonb_data, salary
            FROM (
                VALUES
                    ({}, {}, {}::integer)    --         <------- here
            ) t (name, jsonb_data, salary)
        )

        UPDATE employees e
        SET
            jsonb_data = t.jsonb_data::jsonb,
            salary = t.salary
        FROM temp t
        WHERE
            e.name = t.name
        """,
        "Bob",
        '{"hello": "world"}',
        100_000,
    )

Postgres must assume that each value coming back from the WITH statement will be a string, which is why it complains when we're passing an integer.

theelderbeever commented 2 years ago

How about directly doing an update, rather than using WITH?

Part of the reason I haven't been doing a direct approach is because the real application is a bulk update so the WHERE e.name = t.name is needed along with the CTE (or subquery).

Just discovered though that the VALUES list causes the bad typing not the CTE. And applying the casting in the SELECT seems to be the smoothest.

    await Employees.raw(
        """
        WITH temp as (
            SELECT name, jsonb_data::jsonb, salary::integer    --         <------- here
            FROM (
                VALUES
                    ({}, {}, {}),
                    ({}, {}, {})
            ) t (name, jsonb_data, salary)
        )

        UPDATE employees e
        SET
            jsonb_data = t.jsonb_data,
            salary = t.salary
        FROM temp t
        WHERE
            e.name = t.name
        """,
        "Bob", '{"hello": "world"}', 100_000,
        "Mary", '{"hello": "mars"}', 1_000_000
    )

Thanks for all the work on this btw! Been a little deep in the weeds on this one.

dantownsend commented 2 years ago

@theelderbeever No worries - it was a good learning experience.

theelderbeever commented 2 years ago

FWIW would it be valuable to have the querystring automatically attempt to json.dump lists and dictionaries when doing something like the above? Right now it just directly inserts the value and that tends to fail the sql. Which means that manually dumping the values has to be done.

Edit: it appears asyncpg wants absolutely everything to be a string... UUID, list/dict, and even numerics.