near / near-indexer-for-explorer

Watch NEAR network and store all the data from NEAR blockchain to PostgreSQL database
https://near-indexers.io/docs/projects/near-indexer-for-explorer
GNU General Public License v3.0
123 stars 56 forks source link

taking too long time to sync mainnet as archival mode #260

Closed marcelo-gonzalez closed 2 years ago

marcelo-gonzalez commented 2 years ago

This is reposted from zendesk: https://nearhelp.zendesk.com/agent/tickets/4468

We're running Indexer Explorer in AWS c5.4xlarge (16 CPUs and 32GB memory) in archive mode and we have extracted the 4.9TB data.tar back up from S3 into data folder. We're using Aurora postgres database with 4 CPUs and 16 GB memory.

We're using --concurrency 50000 sync-from-block --height 9820100. After 5 days, our Postgres DB only grew to 50GB. Have no idea how long it'll take to sync all of the history data (Based on the documentation, the full DB will be around 1.1TB).

I saw very low indexer node CPU load (below 5) and the DB node load is also very low.

Could you please advise any of the optimization or changes to make the sync faster? Could you please advise the related config.json file changes we need to improve the sync performance?

We're using the original config.json file with 2 changes. "archive" = true and "tracked_shards": [0]. The command line we're using to run indexer node is:

./target/release/indexer-explorer --home-dir /indexer/near/mainnet run --store-genesis --stream-while-syncing --non-strict-mode --concurrency 50000 sync-from-block --height 9820100

Thanks for your help,

GB

indexer-ec2-load db-load

config.json.txt

LOG.txt

khorolets commented 2 years ago

This is reposted from zendesk: https://nearhelp.zendesk.com/agent/tickets/4468

I'm not sure the author would notice this answer though.


Unfortunately, there is no way to speed it up at the moment.

Anyway, --concurrency 50000 is playing a bad joke with your instance. Any other value different from default is not recommended. The nature of the data Indexer for Explorer is dealing with is very reliable on the previous portion of the data. Sometimes it's impossible to store the block N if you haven't stored N-1. In this case concurrency higher than 1 makes more trouble than it helps.

My recommendation is to change the concurrency to the default value (1) and keep it running for a while.

From my experience in most cases, people who try to run the entire Indexer for Explorer have more lightweight cases and just try to avoid writing their own code.

I want to ask the reason you're running the entire Indexer for Explorer, so I can figure out whether I should recommend you to pivot before it's too late :)

khorolets commented 2 years ago

Since the author has not reached out to us I'm closing the issue.