near / near-indexer-for-explorer

Watch NEAR network and store all the data from NEAR blockchain to PostgreSQL database
https://near-indexers.io/docs/projects/near-indexer-for-explorer
GNU General Public License v3.0
123 stars 56 forks source link

Consider adding `code_hash` column to `accounts` #218

Closed encody closed 2 years ago

encody commented 2 years ago

Would it be possible to store the code hash in the accounts table? This would be useful for determining which accounts have contracts deployed, which contracts are deployed to which accounts, whether deployed code has changed, which accounts have the same contracts deployed to them, etc.

telezhnaya commented 2 years ago

Could you please share use cases for this? Today, it's possible to get this information via RPC.

which contracts are deployed to which accounts

Just clarifying: you can't restore the contract source code by contract hash.

whether deployed code has changed

If we add the column to accounts, you won't see the changes. Only the last state.

Note for the team: code_hash can change frequently. We'll have to do UPDATE each time on the random rows. If we decide to store this information, it's better to create a separate table and put all the information about the contracts deployed with the related block hash/timestamp. It will also solve the problem with states.

encody commented 2 years ago

@telezhnaya Thanks for the response.

Although I can get the code hash from the RPC, it's not especially useful in this context, since I can't query the RPC from SQL. For instance, if I want to perform a query only on those accounts that do or do not have code deployed, I must first perform the query, read the results, and then perform an RPC view account call on every account ID to determine whether it has code deployed. Much less efficient than a simple:

WHERE code_hash = '111...'

I am aware of how hashes work, don't worry. If I were to have a contract whose hash I know, it may be useful to see what other accounts have the same contract code deployed. For example, I might want to query every account that has a specific version of Sputnik DAO deployed.

Although the last state would be visible in the accounts table, you raise a good point. If querying for contract deploy transactions isn't sufficient, a history of code hashes may be useful as well.

telezhnaya commented 2 years ago

@encody I forgot about the scenario with DB:

select * from action_receipt_actions 
    where action_receipt_actions.action_kind = 'DEPLOY_CONTRACT'
        and action_receipt_actions.args->>'code_sha256' = '010b1a0bd58a9003f724f60dea7289df267613d46f058a81ee0fbb8565878559'; 

Works pretty fast.

code_hash here is in hex format, you need to use base58-to-hex converter (I used https://appdevtools.com/base58-encoder-decoder)

encody commented 2 years ago

Does this cover the case of multiple deployments to the same account, or of deletion?

telezhnaya commented 2 years ago

It contains all the data, so, if you need the latest one, you need to filter it somehow.

Bexanderthebex commented 2 years ago

@telezhnaya the performance of this query is ok for accounts with very little transactions. e.g.

explain analyze select
          receipt_predecessor_account_id as account_id
        from
          action_receipt_actions where action_kind = 'DEPLOY_CONTRACT'
and receipt_predecessor_account_id =
  'cuban09.near'
    order by random()
        limit 1;

query plan:

Limit  (cost=600.87..600.87 rows=1 width=19) (actual time=16.382..16.385 rows=1 loops=1)
  ->  Sort  (cost=600.87..600.87 rows=1 width=19) (actual time=16.380..16.382 rows=1 loops=1)
        Sort Key: (random())
        Sort Method: quicksort  Memory: 25kB
        ->  Bitmap Heap Scan on action_receipt_actions  (cost=596.84..600.86 rows=1 width=19) (actual time=16.369..16.372 rows=1 loops=1)
              Recheck Cond: ((receipt_predecessor_account_id = 'cuban09.near'::text) AND (action_kind = 'DEPLOY_CONTRACT'::action_kind))
              Heap Blocks: exact=1
              ->  BitmapAnd  (cost=596.84..596.84 rows=1 width=0) (actual time=16.329..16.331 rows=0 loops=1)
                    ->  Bitmap Index Scan on action_receipt_actions_receipt_predecessor_account_id_idx  (cost=0.00..260.57 rows=10134 width=0) (actual time=1.057..1.058 rows=5126 loops=1)
                          Index Cond: (receipt_predecessor_account_id = 'cuban09.near'::text)
                    ->  Bitmap Index Scan on action_receipt_actions_action_kind_idx  (cost=0.00..336.01 rows=14859 width=0) (actual time=14.835..14.835 rows=34901 loops=1)
                          Index Cond: (action_kind = 'DEPLOY_CONTRACT'::action_kind)
Planning Time: 0.171 ms
Execution Time: 16.569 ms

But for accounts like relay.aurora, it always times out. Probably because the index grows linear.

explain analyze select
          receipt_predecessor_account_id as account_id
        from
          action_receipt_actions where action_kind = 'DEPLOY_CONTRACT'
and receipt_predecessor_account_id =
  'relay.aurora'
    order by random()
        limit 1;

However, in the RPC node, it seems like it is faster though could be unrealiable. Do you know why?

Edit: I'm guessing RPC sits behind a redis cache