near / near-indexer-for-explorer

Watch NEAR network and store all the data from NEAR blockchain to PostgreSQL database
https://near-indexers.io/docs/projects/near-indexer-for-explorer
GNU General Public License v3.0
123 stars 56 forks source link

refactor: Denormalize action_receipt_actions table in order to improve common queries performance #99

Closed khorolets closed 3 years ago

khorolets commented 3 years ago

Closes #87

Starting to address the issue. Plan:

khorolets commented 3 years ago

@frol please review, I've checked args logic on localnet and it works. I am hesitating about the new version. I've put 0.6.1 as I don't think we've changed a lot to bump to 0.7.x any objections?

khorolets commented 3 years ago

For history: While this PR was in progress we've released a few minor version so we end up with 0.6.3 version of Indexer for Explorer in this PR.

khorolets commented 3 years ago

Out data is too big and transaction-based migration queries are locking tables for huge amount of time. We end up in using custom python script splitting big query into smaller ones. Here's the script for the future use.

from sqlalchemy import create_engine

START = 1595370903490523743
STEP = 1000000000000
END = 1620717648739546794
ESTIMATED_STEPS = (END - START) / STEP
connection_string = 'postgresql+psycopg2://user:pass@host/database'

def generate_sql(from_timestamp: int, to_timestamp: int) -> str:
    """
    Generates str for SQL query to convert args_base64 to args_json if possible
    """
    return """
    UPDATE action_receipt_actions
            SET args = jsonb_set(args, '{args_json}', decode_or_null(decode(args->>'args_base64', 'base64')), true)
            WHERE action_kind = 'FUNCTION_CALL'
                AND receipt_receiver_account_id != 'client.bridge.near'
                AND receipt_included_in_block_timestamp <= %d
                AND receipt_included_in_block_timestamp >= %d;
    """ % (to_timestamp, from_timestamp)

if __name__ == '__main__':
    print("Establishing connection to %s..." % (connection_string.split('/')[-1],))
    engine = create_engine(connection_string)
    print(f"Estimated queries to execute: {ESTIMATED_STEPS}.")
    from_timestamp = START-STEP
    to_timestamp = START
    counter = 1
    with engine.connect() as con:
        while True:
            from_timestamp += STEP
            to_timestamp += STEP
            if (END - to_timestamp) < STEP:
                break
            print(f"{counter}/{ESTIMATED_STEPS} (from {from_timestamp} to {to_timestamp}")

            out = con.execute(generate_sql(from_timestamp, to_timestamp))
            print(out.rowcount)
            counter += 1

    print("FINISHED")