piccolo-orm / piccolo

A fast, user friendly ORM and query builder which supports asyncio.
https://piccolo-orm.com/
MIT License
1.32k stars 86 forks source link

cockroachDB extremely long query due to Array(UUID()) #1005

Open Jacky56 opened 3 weeks ago

Jacky56 commented 3 weeks ago

hello all,

Assume we have a simple Post table:

from piccolo.columns import *

class Post(Table):
    uuid = UUID(primary_key=True)
    user = ForeignKey(User, null=False)
    description = Text()
    users_mentioned = Array(UUID())

When I perform a Post.objects().run_sync() or Post.select(Post.all_columns()).run_sync() the query is extremely slow.

But when I leave out querying users_mentioned: Array(UUID()) such as Post.select(Post.uuid).run_sync() the query speeds are as expected (empty table).

From looking at my cluster, it performs a very complex query (10s+) each time I request for the Array(UUID()) datatype:

WITH RECURSIVE typeinfo_tree (oid, ns, name, kind, basetype, elemtype, elemdelim, range_subtype, attrtypoids,
attrnames, depth) AS (SELECT ti.oid, ti.ns, ti.name, ti.kind, ti.basetype, ti.elemtype, ti.elemdelim,
ti.range_subtype, ti.attrtypoids, ti.attrnames, _
 FROM (SELECT t.oid AS oid, ns.nspname AS ns, t.typname AS name, t.typtype AS kind, (CASE WHEN t.typtype = '_' THEN (WITH RECURSIVE typebases (oid,
depth) AS (SELECT t2.typbasetype AS oid, _ AS depth
 FROM pg_type AS t2
   WHERE t2.oid = t.oid UNION ALL SELECT t2.typbasetype AS oid, tb.depth + _ AS depth
 FROM pg_type AS t2, typebases AS tb
   WHERE (tb.oid = t2.oid)
    AND (t2.typbasetype != _)) SELECT oid
 FROM typebases
 ORDER BY depth DESC
 LIMIT _) ELSE _ END) AS basetype, t.typelem AS elemtype, elem_t.typdelim AS elemdelim, range_t.rngsubtype AS range_subtype,
(CASE WHEN t.typtype = '_' THEN (SELECT array_agg(ia.atttypid
 ORDER BY ia.attnum)
 FROM pg_attribute AS ia INNER
   JOIN pg_class AS c
    ON (ia.attrelid = c.oid)
   WHERE ((ia.attnum > _)
    AND (NOT ia.attisdropped))
    AND (c.reltype = t.oid)) ELSE _ END) AS attrtypoids, (CASE WHEN t.typtype = '_' THEN (SELECT array_agg(ia.attname::STRING
 ORDER BY ia.attnum)
 FROM pg_attribute AS ia INNER
   JOIN pg_class AS c
    ON (ia.attrelid = c.oid)
   WHERE ((ia.attnum > _)
    AND (NOT ia.attisdropped))
    AND (c.reltype = t.oid)) ELSE _ END) AS attrnames
 FROM pg_catalog.pg_type AS t INNER
   JOIN pg_catalog.pg_namespace AS ns
    ON (ns.oid = t.typnamespace) LEFT
   JOIN pg_type AS elem_t
    ON (((t.typlen = _)
    AND (t.typelem != _))
    AND (t.typelem = elem_t.oid)) LEFT
   JOIN pg_range AS range_t
    ON (t.oid = range_t.rngtypid)) AS ti
   WHERE ti.oid = ANY ($1::OID[]) UNION ALL SELECT ti.oid, ti.ns, ti.name, ti.kind, ti.basetype, ti.elemtype,
ti.elemdelim, ti.range_subtype, ti.attrtypoids, ti.attrnames, tt.depth + _
 FROM (SELECT t.oid AS oid, ns.nspname AS ns, t.typname AS name, t.typtype AS kind, (CASE WHEN t.typtype = '_' THEN (WITH RECURSIVE typebases (oid,
depth) AS (SELECT t2.typbasetype AS oid, _ AS depth
 FROM pg_type AS t2
   WHERE t2.oid = t.oid UNION ALL SELECT t2.typbasetype AS oid, tb.depth + _ AS depth
 FROM pg_type AS t2, typebases AS tb
   WHERE (tb.oid = t2.oid)
    AND (t2.typbasetype != _)) SELECT oid
 FROM typebases
 ORDER BY depth DESC
 LIMIT _) ELSE _ END) AS basetype, t.typelem AS elemtype, elem_t.typdelim AS elemdelim, range_t.rngsubtype AS range_subtype,
(CASE WHEN t.typtype = '_' THEN (SELECT array_agg(ia.atttypid
 ORDER BY ia.attnum)
 FROM pg_attribute AS ia INNER
   JOIN pg_class AS c
    ON (ia.attrelid = c.oid)
   WHERE ((ia.attnum > _)
    AND (NOT ia.attisdropped))
    AND (c.reltype = t.oid)) ELSE _ END) AS attrtypoids, (CASE WHEN t.typtype = '_' THEN (SELECT array_agg(ia.attname::STRING
 ORDER BY ia.attnum)
 FROM pg_attribute AS ia INNER
   JOIN pg_class AS c
    ON (ia.attrelid = c.oid)
   WHERE ((ia.attnum > _)
    AND (NOT ia.attisdropped))
    AND (c.reltype = t.oid)) ELSE _ END) AS attrnames
 FROM pg_catalog.pg_type AS t INNER
   JOIN pg_catalog.pg_namespace AS ns
    ON (ns.oid = t.typnamespace) LEFT
   JOIN pg_type AS elem_t
    ON (((t.typlen = _)
    AND (t.typelem != _))
    AND (t.typelem = elem_t.oid)) LEFT
   JOIN pg_range AS range_t
    ON (t.oid = range_t.rngtypid)) AS ti, typeinfo_tree AS tt
   WHERE ((((tt.elemtype IS NOT NULL)
    AND (ti.oid = tt.elemtype)) OR ((tt.attrtypoids IS NOT NULL)
    AND (ti.oid = ANY (tt.attrtypoids)))) OR ((tt.range_subtype IS NOT NULL)
    AND (ti.oid = tt.range_subtype))) OR ((tt.basetype IS NOT NULL)
    AND (ti.oid = tt.basetype))) SELECT DISTINCT *, basetype::REGTYPE::STRING AS basetype_name, elemtype::REGTYPE::STRING AS elemtype_name,
range_subtype::REGTYPE::STRING AS range_subtype_name
 FROM typeinfo_tree
 ORDER BY depth DESC

Performing a query without piccolo such as:

import psycopg2

connection = psycopg2.connect(**con)
cursor = connection.cursor()
cursor.execute("select * from post")

works fine/speed as intended.

please send help if I am doing anything wrong

Jacky56 commented 3 weeks ago

Another interesting note:

sinisaos commented 3 weeks ago

@Jacky56 You are doing everything right on the table definition and querying Piccolo. I think the problem is how asynpg works with CockroachDB. There have been several issues online based on this. I have Cockroach v22 on my local machine and I can't even insert data with arrays because the error is unable to decorate subquery (which was resolved in v23). Then I upgrade to the latest version available (v24.1.0) and I get the same as you, a very slow query with a complex query in the cluster log. I don't think we can do anything in Piccolo, Based on this discussion the type introspection is done once and I can confirm that if we use a connection pool only the first query is slow and every other query is fine. psycopg doesn't have those problems but Piccolo is based on asyncpg

dantownsend commented 3 weeks ago

@sinisaos Thanks for looking into this. Hopefully asyncpg will solve this problem in a future version. It's a good reason for us to consider supporting multiple Postgres client libraries. I'm going to try upgrading our CI to use Cockroach v24.