feat: add all_ecocredit_txes function

wgwz commented 1 year ago

Closes: https://github.com/regen-network/indexer/issues/32

wgwz commented 1 year ago

While I was testing locally, it seemed like no matter how I wrote a test query, the index was not being used. I learned that postgres does not always use indexes, in cases where it does not need to (postgres smart, me dumb): https://www.pgmustard.com/blog/why-isnt-postgres-using-my-index And also, there are some rules about the types of queries you can write involving GIN indexes on JSONB columns: https://www.postgresql.org/docs/current/datatype-json.html#JSON-INDEXING

So after writing the query properly and disabling seqscan (using set enable_seqscan=false), the indexes are working locally. here's a query filtering on the 'code' in the json:

select data -> 'tx_response' -> 'txhash' as txhash, data -> 'tx_response' -> 'code' as code, data -> 'tx_response' -> 'tx' -> '@type' as type from tx where data ->'tx_response'->'code' @> '0'

And here's a screenshot showing the explain/analyze results for the query (which shows the index being used): Screen Shot 2023-08-15 at 3 33 01 PM

wgwz commented 1 year ago

TL;DR due to a limitation in the connection filter plugin, i think we need to write a custom function to expose as a graphql query.

While setting this up and testing the integration with regen-server, I think I've found that using the connection filter plugin will not work for us in this case. The problem is that the connection filter does not generate a SQL query that properly makes use of the GIN indexes. Here's how I have come to that conclusion.

Run a graphql query utilizing the connection filter plugin

{
allTxes(filter: {data: {contains: {tx_response: {code: 0}}}}) {
nodes {
  hash
}
totalCount
}
}

Observe the raw SQL query as postgresql shows in it's log files

2023-08-15 19:33:02.128 EDT [5606] STATEMENT:  with __local_0__ as (select to_json((json_build_object('hash'::text, (__local_1__."hash")))) as "@nodes" from (select __local_1__.*                                                                                                                                                          
    from "public"."tx" as __local_1__                                                                                                                                                                                                                                                                                                   
    where (((__local_1__."data" @> $1))) and (TRUE) and (TRUE)                                                                                                            order by __local_1__."chain_num" ASC,__local_1__."block_height" ASC,__local_1__."tx_idx" ASC                                                                  

    ) __local_1__), __local_2__ as (select json_agg(to_json(__local_0__)) as data from __local_0__) select coalesce((select __local_2__.data from __local_2__), '[
]'::json) as "data", (                                                                                                                                                
      select json_build_object('totalCount'::text, count(1))                                                                                                                from "public"."tx" as __local_1__                                                                                                                           
      where (((__local_1__."data" @> $1)))                                                                                                                                ) as "aggregates"                                                                                                                                             
2023-08-15 19:33:02.128 EDT [5606] LOG:  execute DDsfnUKoGgwtKn4U3KkDyZ9+Aas=: with __local_0__ as (select to_json((json_build_object('hash'::text, (__local_1__."hash")))) as "@nodes" from (select __local_1__.*                                                                                                                          
    from "public"."tx" as __local_1__                                                                                                                             

    where (((__local_1__."data" @> $1))) and (TRUE) and (TRUE)                                                                                                    
    order by __local_1__."chain_num" ASC,__local_1__."block_height" ASC,__local_1__."tx_idx" ASC                                                                                                                                                                                                                                        
    ) __local_1__), __local_2__ as (select json_agg(to_json(__local_0__)) as data from __local_0__) select coalesce((select __local_2__.data from __local_2__), '[]'::json) as "data", (       
      select json_build_object('totalCount'::text, count(1))                                                                                                                from "public"."tx" as __local_1__                                        
      where (((__local_1__."data" @> $1)))                                                                                                                        
    ) as "aggregates"                                                                                                                                             
2023-08-15 19:33:02.128 EDT [5606] DETAIL:  parameters: $1 = '{"tx_response": {"code": 0}}'

Copy and paste the generated SQL and substitute the parametrized value into a database IDE, and run the query with explain analyze and observe that the tx_data_tx_response_code_idx is not used.

Screen Shot 2023-08-15 at 7 39 28 PM

This can be reproduced with this minimal:

Query 1: returns correct result but does not use the index (same as generated query)

select data -> 'tx_response' -> 'txhash' as txhash, data -> 'tx_response' -> 'code' as code, data -> 'tx_response' -> 'tx' -> '@type' as type from tx where data @> '{"tx_response": {"code": 0}}';

Query 2: returns correct result and uses the GIN index

select data -> 'tx_response' -> 'txhash' as txhash, data -> 'tx_response' -> 'code' as code, data -> 'tx_response' -> 'tx' -> '@type' as type from tx where data ->'tx_response'->'code' @> '0'

I think the best path forward is to just a custom query, and since there's probably enough details here, open an issue in the connection filter plugin and see if it's a problem they would like to solve.

wgwz commented 1 year ago

Screen Shot 2023-08-16 at 4 26 13 PM

With the latest code and new all_ecocredit_txes in this branch, we can see that the added indexes are being utilized as expected in my local development environment.

Next step will be to deploy these changes to the staging indexer, and test it as a graphql query on the regen-server staging environment.

If there are performance issues with this approach, we might be able to get better performance by switching the newly added msg_event_type_idx to be a GIN index rather than a btree index (ref1, ref2 ).

wgwz commented 1 year ago

Just noting that I successfully tested this in the regen-server staging environment. I'm going to go ahead, merge and deploy this to production now to start evaluating the performance there.

regen-network / indexer

feat: add all_ecocredit_txes function #34