near / near-indexer-for-explorer

Watch NEAR network and store all the data from NEAR blockchain to PostgreSQL database
https://near-indexers.io/docs/projects/near-indexer-for-explorer
GNU General Public License v3.0
124 stars 56 forks source link

Optimize transactions list query performance #135

Closed frol closed 3 years ago

frol commented 3 years ago

Here is the query from Explorer:

SELECT
            transactions.transaction_hash as hash,
            transactions.signer_account_id as signer_id,
            transactions.receiver_account_id as receiver_id,
            transactions.included_in_block_hash as block_hash,
            DIV(transactions.block_timestamp, 1000*1000) as block_timestamp,
            transactions.index_in_chunk as transaction_index
          FROM transactions
          ORDER BY transactions.block_timestamp DESC, transactions.index_in_chunk DESC
          LIMIT 10

It is slow as hell, and the list of transactions page on Explorer takes a while to display, and soon it will be completely unusable.

Currently, due to the fact that we order by a combination of columns, the execution plan picks a super slow path of sequential scan:

 Limit  (cost=1025821.13..1025821.24 rows=1 width=173)
   ->  Gather Merge  (cost=1025821.13..2581531.11 rows=13333726 width=173)
         Workers Planned: 2
         ->  Sort  (cost=1024821.10..1041488.26 rows=6666863 width=173)
               Sort Key: block_timestamp DESC, index_in_chunk DESC
               ->  Parallel Seq Scan on transactions  (cost=0.00..991486.79 rows=6666863 width=173)

If I remove the index_in_chunk, the query is super fast:

 Limit  (cost=0.56..1.78 rows=1 width=173)
   ->  Index Scan Backward using transactions_included_in_block_timestamp_idx on transactions  (cost=0.56..19500761.56 rows=16000435 width=173)

I can see two options:

  1. add another index on a combination of the block_timestamp and index_in_chunk
  2. tweak the query (I think it will be convoluted, and it may not play well with the other extension to the query: https://github.com/near/near-explorer/blob/13ca8c0487043c69b1f770908005506d0c9375eb/frontend/src/libraries/explorer-wamp/transactions.ts#L342)
telezhnaya commented 3 years ago

@frol Do I need to think only about this request, or I need to check the speed of all queries and optimize them all? The header sound scarier than the problem inside 🙂

telezhnaya commented 3 years ago

Apart of additional index, I can suggest add WHERE clause:

SELECT
            transactions.transaction_hash as hash,
            transactions.signer_account_id as signer_id,
            transactions.receiver_account_id as receiver_id,
            transactions.included_in_block_hash as block_hash,
            DIV(transactions.block_timestamp, 1000*1000) as block_timestamp,
            transactions.index_in_chunk as transaction_index
          FROM transactions
          where transactions.block_timestamp > 1627561319124288426 
          -- (select transactions.block_timestamp - 600000000000 as aa from transactions order by transactions.block_timestamp desc limit 1)
          -- (select (extract(epoch from now()) - 600) * 1000000000)
          ORDER BY transactions.block_timestamp DESC, transactions.index_in_chunk DESC
          LIMIT 10;

We can use real-world data: we know that we will surely have 10 transactions in 10 minutes. It allows us to cut the bunch of data dramatically. The query above takes ~50ms. The only detail is that we should pass the timestamp parameter: I tried to use subselects, with clause, anything: it always degrade the performance dramatically. But it looks like it's not the problem to pass it in JS code.

@frol what do you think?

telezhnaya commented 3 years ago

UPD we can pass it this way, right from SQL

SELECT
            transactions.transaction_hash as hash,
            transactions.signer_account_id as signer_id,
            transactions.receiver_account_id as receiver_id,
            transactions.included_in_block_hash as block_hash,
            DIV(transactions.block_timestamp, 1000*1000) as block_timestamp,
            transactions.index_in_chunk as transaction_index
          FROM transactions
          where transactions.block_timestamp > ((round(extract(epoch from now())) - 600) * 1000000000)::numeric
          ORDER BY transactions.block_timestamp DESC, transactions.index_in_chunk DESC
          LIMIT 10;
frol commented 3 years ago

@telezhnaya Good point, but we use this query for all sorts of tasks (e.g. extending with WHERE signer_account_id = '...'), where this trick won't work.

Let's try adding the index (use the CREATE INDEXER CONCURRENTLY to avoid blocking the DB), and see how long it will take, and how big is the impact, and whether it will help with the query optimization.