Open Konard opened 4 years ago
Ok, I found the place where row_to_json
is used in code, now I have to learn Haskell...
@Konard We picked row_to_json
over json_build_object
is for 2 reasons:
row_to_json
performed better than json_build_object
in our benchmarks on Postgres 10/11, things could have changed since then.json_build_object
like all Postgres functions have a restriction on the number of arguments that can be passed (by default it is 100, ~though configurable at runtime~, it is not configurable at runtime).row_to_json
however truncates field names to 63 characters when they exceed that limit (this cannot be configurable at runtime), so we work around it by switching to json_build_object
when any of the column names are longer than 63 characters.
We are more than happy to make json_build_object
the default if it performs better than row_to_json
. To measure the performance of these functions:
pg_bench
has to be used over explain analyze
. Every time you run explain analyze
you'll see a different execution time. @0x777 I have 178000 rows in the table, is it ok for benchmark? I ran explain analyze multiple times and selected minimum values for both (pictured in screenshots). Did you mean I should try to select more than just one column?
I have 178000 rows in the table, is it ok for benchmark? I ran explain analyze multiple times and selected minimum values for both (pictured in screenshots). Did you mean I should try to select more than just one column?
Probably 1000 rows and 5 columns should be good enough.
Ok, first of all, let's confirm that the previous observation is correct:
konard@DESKTOP-S3U77KT:~$ cat row_to_json.sql
EXPLAIN ANALYZE SELECT row_to_json((SELECT row FROM (SELECT nodes.id) AS row)) AS root FROM nodes;
konard@DESKTOP-S3U77KT:~$ cat json_build_object.sql
EXPLAIN ANALYZE SELECT json_build_object('id', nodes.id) AS root FROM nodes;
konard@DESKTOP-S3U77KT:~$ pgbench postgres://postgres:password@localhost:5432/postgres -f row_to_json.sql
starting vacuum...end.
transaction type: row_to_json.sql
scaling factor: 1
query mode: simple
number of clients: 1
number of threads: 1
number of transactions per client: 10
number of transactions actually processed: 10/10
latency average = 601.170 ms
tps = 1.663423 (including connections establishing)
tps = 1.670137 (excluding connections establishing)
konard@DESKTOP-S3U77KT:~$ pgbench postgres://postgres:password@localhost:5432/postgres -f json_build_object.sql
starting vacuum...end.
transaction type: json_build_object.sql
scaling factor: 1
query mode: simple
number of clients: 1
number of threads: 1
number of transactions per client: 10
number of transactions actually processed: 10/10
latency average = 542.580 ms
tps = 1.843045 (including connections establishing)
tps = 1.849689 (excluding connections establishing)
konard@DESKTOP-S3U77KT:~$
Well, you are right, when 5 columns are used, the row_to_json
version is better:
konard@DESKTOP-S3U77KT:~$ cat row_to_json.sql
EXPLAIN ANALYZE SELECT row_to_json((SELECT row FROM (SELECT nodes.id, nodes.source_id, nodes.target_id, nodes.x, nodes.y) AS row)) AS root FROM nodes;
konard@DESKTOP-S3U77KT:~$ cat json_build_object.sql
EXPLAIN ANALYZE SELECT json_build_object('id', nodes.id, 'source_id', nodes.source_id, 'target_id', nodes.target_id, 'x', nodes.x, 'y', nodes.y) AS root FROM nodes;
konard@DESKTOP-S3U77KT:~$ pgbench postgres://postgres:password@localhost:5432/postgres -f row_to_json.sql
starting vacuum...end.
transaction type: row_to_json.sql
scaling factor: 1
query mode: simple
number of clients: 1
number of threads: 1
number of transactions per client: 10
number of transactions actually processed: 10/10
latency average = 1155.100 ms
tps = 0.865726 (including connections establishing)
tps = 0.867199 (excluding connections establishing)
konard@DESKTOP-S3U77KT:~$ pgbench postgres://postgres:password@localhost:5432/postgres -f json_build_object.sql
starting vacuum...end.
transaction type: json_build_object.sql
scaling factor: 1
query mode: simple
number of clients: 1
number of threads: 1
number of transactions per client: 10
number of transactions actually processed: 10/10
latency average = 1597.326 ms
tps = 0.626046 (including connections establishing)
tps = 0.626771 (excluding connections establishing)
konard@DESKTOP-S3U77KT:~$
Even on two columns row_to_json
is better...
That means that only on creation of object with single field the json_build_object
gives better performance...
All tests ran on the same number of rows - 178000.
konard@DESKTOP-S3U77KT:~$ cat row_to_json.sql
EXPLAIN ANALYZE SELECT row_to_json((SELECT row FROM (SELECT nodes.id, nodes.source_id) AS row)) AS root FROM nodes;
konard@DESKTOP-S3U77KT:~$ cat json_build_object.sql
EXPLAIN ANALYZE SELECT json_build_object('id', nodes.id, 'source_id', nodes.source_id) AS root FROM nodes;
konard@DESKTOP-S3U77KT:~$ pgbench postgres://postgres:password@localhost:5432/postgres -f row_to_json.sql
starting vacuum...end.
transaction type: row_to_json.sql
scaling factor: 1
query mode: simple
number of clients: 1
number of threads: 1
number of transactions per client: 10
number of transactions actually processed: 10/10
latency average = 685.450 ms
tps = 1.458896 (including connections establishing)
tps = 1.462897 (excluding connections establishing)
konard@DESKTOP-S3U77KT:~$ pgbench postgres://postgres:password@localhost:5432/postgres -f json_build_object.sql
starting vacuum...end.
transaction type: json_build_object.sql
scaling factor: 1
query mode: simple
number of clients: 1
number of threads: 1
number of transactions per client: 10
number of transactions actually processed: 10/10
latency average = 759.474 ms
tps = 1.316700 (including connections establishing)
tps = 1.319767 (excluding connections establishing)
konard@DESKTOP-S3U77KT:~$
I'm sorry, what is the reasoning behind deciding to convert to json in Postgres? This means my database server is doing conversions to json that my application servers could and should be doing before sending the data to the client? Why not return the data from postgres in the usual way?
The problem
In almost any query generated by Hasura you can find confusing (from the first sight) and no so much efficient pattern of SQL selection expression:
A solution
It turns out it is possible to get the same result in much shorter way and with about 12% performance boost as well as better readability (improves maintainability) for developers:
The source example (generated by Hasura's analyze)
Generated from GraphQL query:
Comparison
Tested using Hasura 1.3.2 and PostgreSQL 12.3.
row_to_json version
json_build_object version
Conclusion
It would be great if we can switch all queries to
json_build_object
version. This will impact performance (no additional Sub Plan required), bandwidth (less bytes to send from Hasura to PostgreSQL) and maintainability/readability (it will reduce time required to read, to understand and to optimize queries for developers).In our search queries this pattern occurs more than 70 times per query, so it definitely will improve such queries.
What do you think? Should such optimization be implemented in Hasura?
Inspired by hours spent on understanding the queries and answer from StackExchange.
P.S.
The entire request can be simplified to just one line: