Open mlvzk opened 3 years ago
I have also tried a GraphQL query more similar to the handwritten SQL query. The execution time doesn't differ much, but planning time is shorter. It still doesn't do a hash join:
query MyQuery {
random_media(args: {n: 10}) {
filename
tags {
tag
}
}
}
SELECT
coalesce(json_agg("root"), '[]') AS "root"
FROM
(
SELECT
row_to_json(
(
SELECT
"_6_e"
FROM
(
SELECT
"_1_root.base"."filename" AS "filename",
"_5_root.ar.root.tags"."tags" AS "tags"
) AS "_6_e"
)
) AS "root"
FROM
(
SELECT
*
FROM
"public"."random_media"(('10') :: integer) AS "_0_random_media"
WHERE
('true')
) AS "_1_root.base"
LEFT OUTER JOIN LATERAL (
SELECT
coalesce(json_agg("tags"), '[]') AS "tags"
FROM
(
SELECT
row_to_json(
(
SELECT
"_3_e"
FROM
(
SELECT
"_2_root.ar.root.tags.base"."tag" AS "tag"
) AS "_3_e"
)
) AS "tags"
FROM
(
SELECT
*
FROM
"public"."media_tags"
WHERE
(("_1_root.base"."id") = ("media"))
) AS "_2_root.ar.root.tags.base"
) AS "_4_root.ar.root.tags"
) AS "_5_root.ar.root.tags" ON ('true')
) AS "_7_root"
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------
Aggregate (cost=32.98..32.99 rows=1 width=32) (actual time=0.140..0.141 rows=1 loops=1)
-> Nested Loop Left Join (cost=2.84..32.83 rows=10 width=62) (actual time=0.026..0.127 rows=10 loops=1)
-> Sample Scan on media (cost=0.00..4.10 rows=10 width=46) (actual time=0.006..0.008 rows=10 loops=1)
Sampling: system_rows ('10'::bigint)
-> Aggregate (cost=2.84..2.85 rows=1 width=32) (actual time=0.011..0.011 rows=1 loops=10)
-> Seq Scan on media_tags (cost=0.00..2.83 rows=1 width=5) (actual time=0.006..0.010 rows=1 loops=10)
Filter: (media.id = media)
Rows Removed by Filter: 145
SubPlan 2
-> Result (cost=0.00..0.01 rows=1 width=32) (actual time=0.000..0.000 rows=1 loops=10)
SubPlan 1
-> Result (cost=0.00..0.01 rows=1 width=32) (actual time=0.000..0.000 rows=1 loops=10)
Planning Time: 0.161 ms
Execution Time: 0.162 ms
(14 rows)
Hasura's SQL was scaling terribly on larger datasets (20ms vs 4ms hand-written). Postgresql seems to get confused and does a seqscan on media_tags instead of using an index. With set enable_seqscan=false
it only took 5ms.
However, after adding a hash index on media_tags.media, postgresql started using it instead of seqscan. The difference is only 2x now (0.2ms vs 0.1ms hand-written).
CREATE INDEX ON media_tags USING HASH (media);
Are there any plans on simplifying the queries that are sent to postgres? Maybe doing the serialization to JSON on graphql-engine's side?
In my experience attempts to rewrite hasura SQL by hand almost always gave some results but not very significant.
Change of index options (especially INCLUDE-ing fields) - got much significant results. Which is similar with what you achieved with HASH index.
Another point of optimization is rewriting by hand object/array relations using knowledge about pagination. E.g. we have dataset of 1000 rows and only 25 rows are displayed in one time. Instead of calculating relations for 1000 rows - do only 25 of them.
So generally I gave up on attempts rewriting LEFT OUTER JOIN LATERAL
s - I'm trying to find other way to optimize.
But your experiments would be very intersting to see on biggger samples of data.
@mlvzk For this query, doing an INNER JOIN
would be incorrect as it would exclude media that do not have any tags (inner joins typically result in better execution times than outer joins).
query MyQuery {
random_media(args: {n: 10}) {
filename
tags {
name
}
}
}
Unless there is sample data for these tables, it is hard to see what plans postgres generates and what indexes should help. However, to answer some of the questions:
Are there any plans on simplifying the queries that are sent to postgres? Maybe doing the serialization to JSON on graphql-engine's side?
json aggregation on Postgres is much faster than anything that we can accomplish on the Haskell side (we have done a fair bit of testing on this front a while ago) so it is not fair to compare the following query against hasura generated query:
SELECT filename, tag FROM random_media(10) AS m INNER JOIN media_tags ON media = m.id;
At the very least, you should be comparing it with this query:
explain analyze SELECT filename, tags FROM random_media(10) AS m left outer JOIN (select media, json_agg(mt.*) as tags from media_tags mt group by media) mt ON mt.media = m.id;
But graphql-engine generates something along these lines
SELECT filename, tags FROM random_media(10) AS m left outer JOIN lateral (select json_agg(mt.*) as tags from media_tags mt where mt.media = m.id) s ON true;
to support queries such as this:
query MyQuery {
random_media(args: {n: 10}) {
filename
tags(limit: 10 order_by: {asc: desc}) {
name
}
}
}
Typically graphql-engine emits a fairly well constructed query that can be optimized by Postgres by adding the right indexes. There is always more scope for some subtle optimizations and we hope to attend to some of these in the nearby future.
I too am having trouble with Postgres choosing a nested loop join instead of a hash join with Hasura's generated SQL. A foreign table is involved, and the loops mean that we're incurring network calls out to the foreign server for every row: 7 seconds total to make 164 calls of ~43ms each. (Removing the foreign table from the query does not avoid the outermost nested loop join)
query MyQuery {
job_groups {
disposition_set_id
combined_companies {
name
}
}
}
Table structure:
companies
has a one-to-many relationship to job_groups
on companies.id = job_groups.company_id
. There is a foreign key defined on job_groups
for this relationship.companies
has a one-to-one relationship to legacy_companies
(the foreign table) on companies.crm_id = legacy_companies.hubspot_id
. There is an index on companies.crm_id
.combined_companies
is a view that joins companies
to legacy_companies
.A hand-written query without the lateral join uses hash joins and only makes a single call to the foreign server, with good performance:
SELECT disposition_set_id, c.json
FROM job_groups AS j
left outer JOIN (select id, json_agg(c.*) as json from combined_companies c group by id) c ON c.id = j.company_id;
Hash Left Join (cost=1044.21..1049.29 rows=164 width=36) (actual time=46.738..46.806 rows=164 loops=1)
Hash Cond: (j.company_id = c.id)
-> Seq Scan on job_groups j (cost=0.00..4.64 rows=164 width=8) (actual time=0.012..0.043 rows=164 loops=1)
-> Hash (cost=1042.16..1042.16 rows=164 width=36) (actual time=46.720..46.723 rows=164 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 85kB
-> Subquery Scan on c (cost=1038.47..1042.16 rows=164 width=36) (actual time=46.622..46.679 rows=164 loops=1)
-> HashAggregate (cost=1038.47..1040.52 rows=164 width=36) (actual time=46.622..46.661 rows=164 loops=1)
Group Key: c_1.id
Batches: 1 Memory Usage: 288kB
-> Hash Right Join (cost=31.69..1034.37 rows=820 width=190) (actual time=45.110..45.620 rows=164 loops=1)
Hash Cond: (lc.hubspot_id = c_1.crm_id)
-> Foreign Scan on legacy_companies lc (cost=25.00..1025.00 rows=1000 width=168) (actual time=45.021..45.435 rows=143 loops=1)
Remote server startup cost: 25
-> Hash (cost=4.64..4.64 rows=164 width=54) (actual time=0.077..0.077 rows=164 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 24kB
-> Seq Scan on companies c_1 (cost=0.00..4.64 rows=164 width=54) (actual time=0.004..0.038 rows=164 loops=1)
Planning Time: 0.559 ms
Execution Time: 48.602 ms
Adding the lateral join, as the hasura-generated query does, switches to the nested loop join and makes the repeated calls out to the foreign server:
SELECT disposition_set_id, json
FROM job_groups AS j
left outer JOIN lateral (select json_agg(c.*) as json from combined_companies c where c.id = j.company_id) s ON true;
Nested Loop Left Join (cost=1042.57..170990.22 rows=164 width=36) (actual time=45.119..6909.235 rows=164 loops=1)
-> Seq Scan on job_groups m (cost=0.00..4.64 rows=164 width=8) (actual time=0.012..0.155 rows=164 loops=1)
-> Aggregate (cost=1042.57..1042.58 rows=1 width=32) (actual time=42.125..42.125 rows=1 loops=164)
-> Nested Loop Left Join (cost=25.00..1042.55 rows=5 width=190) (actual time=41.762..42.098 rows=1 loops=164)
Join Filter: (c.crm_id = lc.hubspot_id)
Rows Removed by Join Filter: 142
-> Seq Scan on companies c (cost=0.00..5.05 rows=1 width=54) (actual time=0.012..0.028 rows=1 loops=164)
Filter: (id = m.company_id)
Rows Removed by Filter: 163
-> Foreign Scan on legacy_companies lc (cost=25.00..1025.00 rows=1000 width=168) (actual time=41.504..42.041 rows=143 loops=164)
Remote server startup cost: 25
Planning Time: 0.723 ms
Execution Time: 6911.116 ms
Hello. I've noticed that the SQL Hasura generates is a lot slower than what it could be. The main difference is that it's using Nested Loop Left Join instead of Hash Join (see EXPLAIN ANALYZE below). This is on a very small dataset (<200 rows), I assume the performance difference would be even bigger on a larger dataset.
Here's some data:
Tables + function SQL
Function:
Tables:
Hasura
GraphQL query:
SQL:
EXPLAIN ANALYZE:
Hand-written SQL:
EXPLAIN ANALYZE: