Closed wgwz closed 1 year ago
While I was testing locally, it seemed like no matter how I wrote a test query, the index was not being used. I learned that postgres does not always use indexes, in cases where it does not need to (postgres smart, me dumb): https://www.pgmustard.com/blog/why-isnt-postgres-using-my-index And also, there are some rules about the types of queries you can write involving GIN indexes on JSONB columns: https://www.postgresql.org/docs/current/datatype-json.html#JSON-INDEXING
So after writing the query properly and disabling seqscan (using set enable_seqscan=false
), the indexes are working locally.
here's a query filtering on the 'code' in the json:
select data -> 'tx_response' -> 'txhash' as txhash, data -> 'tx_response' -> 'code' as code, data -> 'tx_response' -> 'tx' -> '@type' as type from tx where data ->'tx_response'->'code' @> '0'
And here's a screenshot showing the explain/analyze results for the query (which shows the index being used):
TL;DR due to a limitation in the connection filter plugin, i think we need to write a custom function to expose as a graphql query.
While setting this up and testing the integration with regen-server, I think I've found that using the connection filter plugin will not work for us in this case. The problem is that the connection filter does not generate a SQL query that properly makes use of the GIN indexes. Here's how I have come to that conclusion.
Run a graphql query utilizing the connection filter plugin
{
allTxes(filter: {data: {contains: {tx_response: {code: 0}}}}) {
nodes {
hash
}
totalCount
}
}
Observe the raw SQL query as postgresql shows in it's log files
2023-08-15 19:33:02.128 EDT [5606] STATEMENT: with __local_0__ as (select to_json((json_build_object('hash'::text, (__local_1__."hash")))) as "@nodes" from (select __local_1__.*
from "public"."tx" as __local_1__
where (((__local_1__."data" @> $1))) and (TRUE) and (TRUE) order by __local_1__."chain_num" ASC,__local_1__."block_height" ASC,__local_1__."tx_idx" ASC
) __local_1__), __local_2__ as (select json_agg(to_json(__local_0__)) as data from __local_0__) select coalesce((select __local_2__.data from __local_2__), '[
]'::json) as "data", (
select json_build_object('totalCount'::text, count(1)) from "public"."tx" as __local_1__
where (((__local_1__."data" @> $1))) ) as "aggregates"
2023-08-15 19:33:02.128 EDT [5606] LOG: execute DDsfnUKoGgwtKn4U3KkDyZ9+Aas=: with __local_0__ as (select to_json((json_build_object('hash'::text, (__local_1__."hash")))) as "@nodes" from (select __local_1__.*
from "public"."tx" as __local_1__
where (((__local_1__."data" @> $1))) and (TRUE) and (TRUE)
order by __local_1__."chain_num" ASC,__local_1__."block_height" ASC,__local_1__."tx_idx" ASC
) __local_1__), __local_2__ as (select json_agg(to_json(__local_0__)) as data from __local_0__) select coalesce((select __local_2__.data from __local_2__), '[]'::json) as "data", (
select json_build_object('totalCount'::text, count(1)) from "public"."tx" as __local_1__
where (((__local_1__."data" @> $1)))
) as "aggregates"
2023-08-15 19:33:02.128 EDT [5606] DETAIL: parameters: $1 = '{"tx_response": {"code": 0}}'
Copy and paste the generated SQL and substitute the parametrized value into a database IDE, and run the query with explain analyze and observe that the tx_data_tx_response_code_idx
is not used.
This can be reproduced with this minimal:
select data -> 'tx_response' -> 'txhash' as txhash, data -> 'tx_response' -> 'code' as code, data -> 'tx_response' -> 'tx' -> '@type' as type from tx where data @> '{"tx_response": {"code": 0}}';
select data -> 'tx_response' -> 'txhash' as txhash, data -> 'tx_response' -> 'code' as code, data -> 'tx_response' -> 'tx' -> '@type' as type from tx where data ->'tx_response'->'code' @> '0'
I think the best path forward is to just a custom query, and since there's probably enough details here, open an issue in the connection filter plugin and see if it's a problem they would like to solve.
With the latest code and new all_ecocredit_txes
in this branch, we can see that the added indexes are being utilized as expected in my local development environment.
Next step will be to deploy these changes to the staging indexer, and test it as a graphql query on the regen-server staging environment.
If there are performance issues with this approach, we might be able to get better performance by switching the newly added msg_event_type_idx
to be a GIN index rather than a btree index (ref1, ref2 ).
Just noting that I successfully tested this in the regen-server staging environment. I'm going to go ahead, merge and deploy this to production now to start evaluating the performance there.
Closes: https://github.com/regen-network/indexer/issues/32