product-os / jellyfish

The Jellyfish Project
https://jel.ly.fish/
GNU Affero General Public License v3.0
15 stars 3 forks source link

Slow link contract queries #6503

Closed joshbwlng closed 3 years ago

joshbwlng commented 3 years ago

Original conversation: https://jel.ly.fish/7950669f-fbb7-46f3-b95b-e2be9a393520

Useful commands

> ANALYZE cards;
> EXPLAIN <query>;

SDK Query

await window.sdk.query({
  type: 'object',
  required: [ 'type', 'data' ],
  properties: {
    type: {
      type: 'string',
      const: 'link@1.0.0'
    },
    data: {
      type: 'object',
      required: [ 'from' ],
      properties: {
        from: {
          type: 'object',
          required: [ 'type' ],
          properties: {
            type: {
              type: 'string',
              const: 'product-improvement@1.0.0'
            }
          }
        }
      }
    }
  }
})

Generated SQL

SELECT
  to_jsonb(cards) || jsonb_build_object(
    'version',
    CONCAT_WS(
      '+',
      CONCAT_WS(
        '-',
        CONCAT_WS(
          '.',
          cards.version_major,
          cards.version_minor,
          cards.version_patch
        ),
        NULLIF(cards.version_prerelease, '')
      ),
      NULLIF(cards.version_build, '')
    )
  ) AS payload
FROM
  cards
WHERE
  (
    NOT (
      EXISTS (
        SELECT
          1
        FROM
          unnest(cards.markers) AS contents
        WHERE
          NOT (
            (
              contents IN ('user-jellyfish', 'org-balena')
              OR (contents)::text ~ E'(^|\\+)(user-jellyfish|org-balena)($|\\+)'
            )
          )
      )
    )
    AND cards.type = 'link@1.0.0'
    AND jsonb_typeof(cards.data) = 'object'
    AND jsonb_typeof(
      cards.data #>'{"from"}') = 'object' AND jsonb_typeof(cards.data#>'{"from", "type"}') = 'string' AND (cards.data#>>'{"from", "type"}')::text = 'product-improvement@1.0.0')
      LIMIT
        1000

EXPLAIN results (production)

-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.56..6588489.61 rows=1 width=32)
   ->  Index Scan using link__name__idx on cards  (cost=0.56..6588489.61 rows=1 width=32)
         Filter: ((jsonb_typeof(data) = 'object'::text) AND ((data #>> '{from,type}'::text[]) = 'product-improvement@1.0.0'::text) AND (jsonb_typeof((data #> '{from}'::text[])) = 'object'::text) AND (jsonb_typeof((data #> '{from,type}'::text[])) = 'string'::text) AND (NOT (SubPlan 1)))
         SubPlan 1
           ->  Function Scan on unnest contents  (cost=0.00..0.15 rows=8 width=0)
                 Filter: ((contents <> ALL ('{user-jellyfish,org-balena}'::text[])) AND (contents !~ '(^|\+)(user-jellyfish|org-balena)($|\+)'::text))
(6 rows)

Production index checks

Confirmed that the indexes are not invalid and are populated with data:

jellyfish=> SELECT relname FROM pg_class, pg_index WHERE pg_index.indisvalid = false AND pg_index.indexrelid = pg_class.oid;
 relname 
---------
(0 rows)

jellyfish=> select pg_size_pretty(pg_table_size('link__data_from_id__idx'));
 pg_size_pretty 
----------------
 1245 MB
(1 row)

jellyfish=> select pg_size_pretty(pg_table_size('link__data_from_type__idx'));
 pg_size_pretty 
----------------
 673 MB
(1 row)

jellyfish=> select pg_size_pretty(pg_table_size('link__data_to_type__idx'));
 pg_size_pretty 
----------------
 832 MB
(1 row)

jellyfish=> select pg_size_pretty(pg_table_size('link__data_to_id__idx'));
 pg_size_pretty 
----------------
 1247 MB
(1 row)

Index definitions (production)

jellyfish=> select indexname,indexdef from pg_indexes where indexname like '%link%' and tablename='cards';
                 indexname                 |                                                                                                    indexdef
-------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 link__data_from_id__name__data_to_id__idx | CREATE INDEX link__data_from_id__name__data_to_id__idx ON public.cards USING btree ((((data -> 'from'::text) ->> 'id'::text)), name, (((data -> 'to'::text) ->> 'id'::text))) WHERE (type = 'link@1.0.0'::text)
 link__data_from_type__idx                 | CREATE INDEX link__data_from_type__idx ON public.cards USING btree ((((data -> 'from'::text) ->> 'type'::text))) WHERE (type = 'link@1.0.0'::text)
 link__data_from_id__idx                   | CREATE INDEX link__data_from_id__idx ON public.cards USING btree ((((data -> 'from'::text) ->> 'id'::text))) WHERE (type = 'link@1.0.0'::text)
 link__data_to_id__idx                     | CREATE INDEX link__data_to_id__idx ON public.cards USING btree ((((data -> 'to'::text) ->> 'id'::text))) WHERE (type = 'link@1.0.0'::text)
 link__data_to_type__idx                   | CREATE INDEX link__data_to_type__idx ON public.cards USING btree ((((data -> 'to'::text) ->> 'type'::text))) WHERE (type = 'link@1.0.0'::text)
 link__name__idx                           | CREATE INDEX link__name__idx ON public.cards USING btree (name) WHERE (type = 'link@1.0.0'::text)
(6 rows)
joshbwlng commented 3 years ago

More EXPLAIN result for a simpler query on production:

jellyfish=> explain (costs, verbose) select data->'from'->'type' from cards where type='link@1.0.0' limit 1000;
                                               QUERY PLAN
--------------------------------------------------------------------------------------------------------
 Limit  (cost=0.56..260.63 rows=1000 width=32)
   Output: (((data -> 'from'::text) -> 'type'::text))
   ->  Index Scan using link__name__idx on public.cards  (cost=0.56..5699435.70 rows=21915317 width=32)
         Output: ((data -> 'from'::text) -> 'type'::text)
(4 rows)
jellyfish=> explain (analyze, costs, buffers, verbose) select data->'from'->'type' from cards where type='link@1.0.0' limit 1000;
                                                                     QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.56..260.63 rows=1000 width=32) (actual time=0.048..1.640 rows=1000 loops=1)
   Output: (((data -> 'from'::text) -> 'type'::text))
   Buffers: shared hit=357 read=39
   I/O Timings: read=0.559
   ->  Index Scan using link__name__idx on public.cards  (cost=0.56..5699449.62 rows=21915373 width=32) (actual time=0.048..1.543 rows=1000 loops=1)
         Output: ((data -> 'from'::text) -> 'type'::text)
         Buffers: shared hit=357 read=39
         I/O Timings: read=0.559
 Planning Time: 0.791 ms
 Execution Time: 1.736 ms
(10 rows)
jellyfish=> explain (analyze, costs, buffers, verbose) select data from cards where type='link@1.0.0' and (cards.data#>>'{"from", "type"}')::text='product-improvement@1.0.0' limit 5;
                                                                       QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.56..260.63 rows=5 width=374) (actual time=40027.808..59088.741 rows=5 loops=1)
   Output: data
   Buffers: shared hit=2323222 read=3629084 dirtied=1132
   I/O Timings: read=44288.287
   ->  Index Scan using link__name__idx on public.cards  (cost=0.56..5701498.14 rows=109616 width=374) (actual time=40027.807..59088.738 rows=5 loops=1)
         Output: data
         Filter: ((cards.data #>> '{from,type}'::text[]) = 'product-improvement@1.0.0'::text)
         Rows Removed by Filter: 14278524
         Buffers: shared hit=2323222 read=3629084 dirtied=1132
         I/O Timings: read=44288.287
 Planning Time: 0.871 ms
 Execution Time: 59088.757 ms
(12 rows)
jellyfish=> explain (analyze, costs, buffers, verbose) select data from cards where type='link@1.0.0' and (cards.data#>>'{"from", "type"}')::text='product-improvement@1.0.0' limit 1000;
                                                                              QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=1000.56..51363.67 rows=1000 width=374) (actual time=11556.092..45877.047 rows=26 loops=1)
   Output: data
   Buffers: shared hit=4096523 read=5334016 dirtied=20262
   I/O Timings: read=111642.289
   ->  Gather  (cost=1000.56..5521653.18 rows=109617 width=374) (actual time=11556.091..45877.041 rows=26 loops=1)
         Output: data
         Workers Planned: 2
         Workers Launched: 2
         Buffers: shared hit=4096523 read=5334016 dirtied=20262
         I/O Timings: read=111642.289
         ->  Parallel Index Scan using link__name__idx on public.cards  (cost=0.56..5509691.48 rows=45674 width=374) (actual time=11303.556..45873.382 rows=9 loops=3)
               Output: data
               Filter: ((cards.data #>> '{from,type}'::text[]) = 'product-improvement@1.0.0'::text)
               Rows Removed by Filter: 7402458
               Buffers: shared hit=4096523 read=5334016 dirtied=20262
               I/O Timings: read=111642.289
               Worker 0: actual time=10828.870..45872.607 rows=8 loops=1
                 Buffers: shared hit=1369245 read=1780992 dirtied=6690
                 I/O Timings: read=37194.538
               Worker 1: actual time=11525.900..45872.884 rows=12 loops=1
                 Buffers: shared hit=1363763 read=1775441 dirtied=6750
                 I/O Timings: read=37200.183
 Planning Time: 0.839 ms
 Execution Time: 45877.069 ms
(24 rows)
joshbwlng commented 3 years ago

SOLVED

The index definitions didn't match the queries. Redefining the indexes to the following solved the problem:

CREATE INDEX link__data_from_type__idx ON public.cards USING btree ((data#>>'{"from", "type"}')) WHERE (type = 'link@1.0.0'::text);
CREATE INDEX link__data_from_id__idx ON public.cards USING btree ((data#>>'{"from", "id"}')) WHERE (type = 'link@1.0.0'::text);

CREATE INDEX link__data_to_type__idx ON public.cards USING btree ((data#>>'{"to", "type"}')) WHERE (type = 'link@1.0.0'::text);
CREATE INDEX link__data_to_id__idx ON public.cards USING btree ((data#>>'{"to", "id"}')) WHERE (type = 'link@1.0.0'::text);
joshbwlng commented 3 years ago

Patterns

1. Multiple fields

link name,data.from.id,data.to.id ```js await window.sdk.query({ type: 'object', required: [ 'type', 'name', 'data' ], properties: { type: { type: 'string', const: 'link@1.0.0' }, name: { type: 'string', const: 'test' }, data: { type: 'object', required: [ 'from', 'to' ], properties: { from: { type: 'object', required: [ 'id' ], properties: { id: { type: 'string', const: '1234' } } }, to: { type: 'object', required: [ 'id' ], properties: { id: { type: 'string', const: '5678' } } } } } } }) ``` ```sql SELECT to_jsonb(cards) || jsonb_build_object( 'version', CONCAT_WS( '+', CONCAT_WS( '-', CONCAT_WS( '.', cards.version_major, cards.version_minor, cards.version_patch ), NULLIF(cards.version_prerelease, '') ), NULLIF(cards.version_build, '') ) ) AS payload FROM cards WHERE ( NOT ( EXISTS ( SELECT 1 FROM unnest(cards.markers) AS contents WHERE NOT ( ( contents IN ('user-jellyfish', 'org-balena') OR (contents) :: text ~ E '(^|\\+)(user-jellyfish|org-balena)($|\\+)' ) ) ) ) AND cards.type = 'link@1.0.0' AND cards.name = 'test' AND jsonb_typeof(cards.data) = 'object' AND jsonb_typeof(cards.data #>'{"from"}') = 'object' AND jsonb_typeof(cards.data#>'{"from", "id"}') = 'string' AND (cards.data#>>'{"from", "id"}')::text = '1234' AND jsonb_typeof(cards.data#>'{"to"}') = 'object' AND jsonb_typeof(cards.data#>'{"to", "id"}') = 'string' AND (cards.data#>>'{"to", "id"}')::text = '5678' ) LIMIT 1000 ```

2. Single JSONB child string field

support-thread data.status ```js await window.sdk.query({ type: 'object', required: [ 'type', 'data' ], properties: { type: { type: 'string', const: 'support-thread@1.0.0' }, data: { type: 'object', required: [ 'status' ], properties: { status: { type: 'string', const: 'open' } } } } }) ``` ```sql SELECT to_jsonb(cards) || jsonb_build_object( 'version', CONCAT_WS( '+', CONCAT_WS( '-', CONCAT_WS( '.', cards.version_major, cards.version_minor, cards.version_patch ), NULLIF(cards.version_prerelease, '') ), NULLIF(cards.version_build, '') ) ) AS payload FROM cards WHERE ( NOT ( EXISTS ( SELECT 1 FROM unnest(cards.markers) AS contents WHERE NOT ( ( contents IN ('user-jellyfish', 'org-balena') OR (contents) :: text ~ E '(^|\\+)(user-jellyfish|org-balena)($|\\+)' ) ) ) ) AND cards.type = 'support-thread@1.0.0' AND jsonb_typeof(cards.data) = 'object' AND jsonb_typeof(cards.data #>'{"status"}') = 'string' AND (cards.data#>>'{"status"}')::text = 'open' ) LIMIT 1000; ```

3. Single JSONB child string array field

summary data.readBy ```js await window.sdk.query({ type: 'object', required: [ 'type', 'data' ], properties: { type: { type: 'string', const: 'summary@1.0.0' }, data: { type: 'object', required: [ 'readBy' ], properties: { readBy: { type: 'array', contains: { const: 'test' } } } } } }) ``` ```sql SELECT to_jsonb(cards) || jsonb_build_object( 'version', CONCAT_WS( '+', CONCAT_WS( '-', CONCAT_WS( '.', cards.version_major, cards.version_minor, cards.version_patch ), NULLIF(cards.version_prerelease, '') ), NULLIF(cards.version_build, '') ) ) AS payload FROM cards WHERE ( NOT ( EXISTS ( SELECT 1 FROM unnest(cards.markers) AS contents WHERE NOT ( ( contents IN ('user-jellyfish', 'org-balena') OR (contents) :: text ~ E '(^|\\+)(user-jellyfish|org-balena)($|\\+)' ) ) ) ) AND cards.type = 'summary@1.0.0' AND jsonb_typeof(cards.data) = 'object' AND jsonb_typeof(cards.data #>'{"readBy"}') = 'array' AND cards.data#>'{"readBy"}' @> '"test"' ) LIMIT 1000 ```

4. Single non-JSONB string field

group name ```js await window.sdk.query({ type: 'object', required: [ 'type', 'name' ], properties: { type: { type: 'string', const: 'group@1.0.0' }, name: { type: 'string', const: 'test' } } }) ``` ```sql SELECT to_jsonb(cards) || jsonb_build_object( 'version', CONCAT_WS( '+', CONCAT_WS( '-', CONCAT_WS( '.', cards.version_major, cards.version_minor, cards.version_patch ), NULLIF(cards.version_prerelease, '') ), NULLIF(cards.version_build, '') ) ) AS payload FROM cards WHERE ( NOT ( EXISTS ( SELECT 1 FROM unnest(cards.markers) AS contents WHERE NOT ( ( contents IN ('user-jellyfish', 'org-balena') OR (contents) :: text ~ E '(^|\\+)(user-jellyfish|org-balena)($|\\+)' ) ) ) ) AND cards.type = 'group@1.0.0' AND cards.name = 'test' ) LIMIT 1000 ```

5. Single non-JSONB string array field

support-thread tags ```js await window.sdk.query({ type: 'object', required: [ 'type', 'tags' ], properties: { type: { type: 'string', const: 'support-thread@1.0.0' }, tags: { type: 'array', contains: { const: 'test' } } } }) ``` ```sql SELECT to_jsonb(cards) || jsonb_build_object( 'version', CONCAT_WS( '+', CONCAT_WS( '-', CONCAT_WS( '.', cards.version_major, cards.version_minor, cards.version_patch ), NULLIF(cards.version_prerelease, '') ), NULLIF(cards.version_build, '') ) ) AS payload FROM cards WHERE ( NOT ( EXISTS ( SELECT 1 FROM unnest(cards.markers) AS contents WHERE NOT ( ( contents IN ('user-jellyfish', 'org-balena') OR (contents) :: text ~ E '(^|\\+)(user-jellyfish|org-balena)($|\\+)' ) ) ) ) AND cards.type = 'support-thread@1.0.0' AND EXISTS ( SELECT 1 FROM unnest(cards.tags) AS contents WHERE contents = 'test' ) ) LIMIT 1000 ```
joshbwlng commented 3 years ago

Tried the following indexes on the production database to see if the Postgres planner would use them over the current indexes. The results are promising as they seem to be a better match:

1. Multiple JSONB child fields

CREATE INDEX CONCURRENTLY link_test_idx ON public.cards USING btree (((data#>>'{"from","id"}')), name, ((data#>>'{"to","id"}'))) WHERE (type = 'link@1.0.0'::text);

 Limit  (cost=0.56..2.86 rows=1 width=32)
   ->  Index Scan using link_test_idx on cards  (cost=0.56..2.86 rows=1 width=32)
         Index Cond: (((data #>> '{from,id}'::text[]) = '1234'::text) AND (name = 'test'::text) AND ((data #>> '{to,id}'::text[]) = '5678'::text))
         Filter: ((jsonb_typeof(data) = 'object'::text) AND (jsonb_typeof((data #> '{from}'::text[])) = 'object'::text) AND (jsonb_typeof((data #> '{to}'::text[])) = 'object'::text) AND (jsonb_typeof((data #> '{from,id}'::text[])) = 'string'::text) AND (jsonb_typeof((data #> '{to,id}'::text[])) = 'string'::text) AND (NOT (SubPlan 1)))
         SubPlan 1
           ->  Function Scan on unnest contents  (cost=0.00..0.15 rows=8 width=0)
                 Filter: ((contents <> ALL ('{user-jellyfish,org-balena}'::text[])) AND (contents !~ '(^|\+)(user-jellyfish|org-balena)($|\+)'::text))

2. Single JSONB child string field

CREATE INDEX CONCURRENTLY "support-thread_test_idx" ON public.cards USING btree ((data#>>'{"status"}')) WHERE (type = 'support-thread@1.0.0'::text);

 Limit  (cost=0.29..26.73 rows=1 width=32)
   ->  Index Scan using "support-thread_test_idx" on cards  (cost=0.29..26.73 rows=1 width=32)
         Index Cond: ((data #>> '{status}'::text[]) = 'open'::text)
         Filter: ((jsonb_typeof(data) = 'object'::text) AND (jsonb_typeof((data #> '{status}'::text[])) = 'string'::text) AND (NOT (SubPlan 1)))
         SubPlan 1
           ->  Function Scan on unnest contents  (cost=0.00..0.15 rows=8 width=0)
                 Filter: ((contents <> ALL ('{user-jellyfish,org-balena}'::text[])) AND (contents !~ '(^|\+)(user-jellyfish|org-balena)($|\+)'::text))

3. Single JSONB child string array field

CREATE INDEX CONCURRENTLY "summarytestidx" ON public.cards USING gin ((data#>'{"readBy"}')) WHERE (type = 'summary@1.0.0'::text);

 Limit  (cost=2.20..6.82 rows=1 width=32)
   ->  Bitmap Heap Scan on cards  (cost=2.20..6.82 rows=1 width=32)
         Recheck Cond: (((data #> '{readBy}'::text[]) @> '"test"'::jsonb) AND (type = 'summary@1.0.0'::text))
         Filter: ((jsonb_typeof(data) = 'object'::text) AND (jsonb_typeof((data #> '{readBy}'::text[])) = 'array'::text) AND (NOT (SubPlan 1)))
         ->  Bitmap Index Scan on summary__test__idx  (cost=0.00..2.20 rows=4 width=0)
               Index Cond: ((data #> '{readBy}'::text[]) @> '"test"'::jsonb)
         SubPlan 1
           ->  Function Scan on unnest contents  (cost=0.00..0.15 rows=8 width=0)
                 Filter: ((contents <> ALL ('{user-jellyfish,org-balena}'::text[])) AND (contents !~ '(^|\+)(user-jellyfish|org-balena)($|\+)'::text))
joshbwlng commented 3 years ago

Moving to https://github.com/product-os/jellyfish-core/issues/867