mimblewimble / grin

Minimal implementation of the Mimblewimble protocol.
https://grin.mw/
Apache License 2.0
5.04k stars 991 forks source link

Full node sync is quite slow #1098

Closed garyyu closed 6 years ago

garyyu commented 6 years ago

Clear .grin folder, and start Grin withfull archive mode, after over 24 hours, the Chain Height is 44265 and Latest Header is 64832. And at this moment, the actual tip of testnet2 is 90654, so I spent 24 hours and the sync is just on the half way (48.8%).

#run the node in "full archive" mode (default is fast-sync, pruned node)
archive_mode = true 
May 28 20:07:26.168 INFO This is Grin version 0.2.0 (git testnet2-125-gac52799), built for x86_64-apple-darwin by rustc 1.26.0
...
May 29 20:34:47.765 DEBG pipe: process_block 40b58d59 at 44392 with 0 inputs, 1 outputs, 1 kernels
May 29 20:34:47.774 DEBG pipe: chain head reached 44392 @ 66981 [40b58d59]
May 29 20:34:47.774 DEBG Committing txhashset extension. sizes (145184, 145184, 116995)
May 29 20:34:47.789 DEBG pipe: process_block 04e055e0 at 44393 with 0 inputs, 1 outputs, 1 kernels
May 29 20:34:47.797 DEBG pipe: chain head reached 44393 @ 66982 [04e055e0]
May 29 20:34:47.797 DEBG Committing txhashset extension. sizes (145185, 145185, 116997)
May 29 20:34:47.812 DEBG pipe: process_block e3a16c85 at 44394 with 0 inputs, 1 outputs, 1 kernels
May 29 20:34:47.820 DEBG pipe: chain head reached 44394 @ 66983 [e3a16c85]
May 29 20:34:47.820 DEBG Committing txhashset extension. sizes (145187, 145187, 116998)
May 29 20:34:47.835 DEBG pipe: process_block b1d37a9a at 44395 with 0 inputs, 1 outputs, 1 kernels
May 29 20:34:47.843 DEBG pipe: chain head reached 44395 @ 66984 [b1d37a9a]
May 29 20:34:47.844 DEBG Committing txhashset extension. sizes (145188, 145188, 117002)
May 29 20:34:47.858 DEBG pipe: process_block e55c94b3 at 44396 with 0 inputs, 1 outputs, 1 kernels
May 29 20:34:47.866 DEBG pipe: chain head reached 44396 @ 66985 [e55c94b3]
May 29 20:34:47.867 DEBG Committing txhashset extension. sizes (145192, 145192, 117003)
May 29 20:34:47.881 DEBG pipe: process_block 5dd85416 at 44397 with 0 inputs, 1 outputs, 1 kernels
May 29 20:34:47.890 DEBG pipe: chain head reached 44397 @ 66986 [5dd85416]

The chain database size:

$ ls -al .grin/chain
49,662,896 May 29 05:39 000015.sst
42,876,408 May 29 09:48 000020.sst
33,541,092 May 29 18:41 000031.sst
4,350,861 May 29 20:34 000033.sst
14,902,003 May 29 20:37 000035.log

Looks like the chain data size is just 130MB, but the nettop captured bytes_in is over 450MB.

130MB data downloading spent 24 hours! That means full node sync is too slow or maybe bug(s) exist there.

antiochp commented 6 years ago

Is this with cargo build or cargo build --release? The debug build (the default) is very slow for some of the libsecp operations (kernel signature verification, rangeproof verification etc.)

Also - how many peers are you connected to? The sync speed is heavily dependent on the number of peers you can request blocks from in parallel.

garyyu commented 6 years ago

It's a release build. The peers connection is between 8 and 12.

May 29 20:34:40.903 DEBG monitor_peers: 12 connected (7 most_work). all 306 = 13 healthy + 0 banned + 293 defunct
May 29 20:35:00.972 DEBG monitor_peers: 12 connected (9 most_work). all 306 = 13 healthy + 0 banned + 293 defunct
May 29 20:35:21.045 DEBG monitor_peers: 12 connected (9 most_work). all 306 = 13 healthy + 0 banned + 293 defunct
baryluk commented 6 years ago

I do have release binary, and even with archive_mode false (default), syncing takes forever:

$ du -hs ./.grin/
137M    ./.grin/
$ LC_ALL=C ls -al ./.grin/chain/
total 38682
drwxr-xr-x 2 baryluk baryluk        9 Jun  6 21:36 .
drwxr-xr-x 5 baryluk baryluk        5 Jun  6 21:36 ..
-rw-r--r-- 1 baryluk baryluk 62805448 Jun  7 20:15 000003.log
-rw-r--r-- 1 baryluk baryluk       16 Jun  6 21:36 CURRENT
-rw-r--r-- 1 baryluk baryluk       37 Jun  6 21:36 IDENTITY
-rw-r--r-- 1 baryluk baryluk        0 Jun  6 21:36 LOCK
-rw-r--r-- 1 baryluk baryluk    14406 Jun  6 21:36 LOG
-rw-r--r-- 1 baryluk baryluk       13 Jun  6 21:36 MANIFEST-000001
-rw-r--r-- 1 baryluk baryluk     4551 Jun  6 21:36 OPTIONS-000005
Current Status: Syncing - Latest header: 100459

This is after running for 20 hours. It was using 100% of CPU (so 1 core out of 12 in my machine), but now is just ~18-20% of CPU (of one core). Memory usage is small. 100MiB resident + 6MiB shared.

$ LC_All=C ps -C grin u
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
baryluk  24580 18.6  0.3 6868288 103260 pts/43 Sl+  cze06 252:53 ./target/release/grin

I am connected to 16 peers, all over IPv4.

Total Peers: 16
Longest Chain: 137857 D @ 100461 H vs Us: 1000 D @ 0 H
This is Grin version 0.2.0 (git testnet2-153-g8f66016), built for x86_64-unknown-linux-gnu by rustc 1.26.1.
Built with profile "release", features "" on Wed, 06 Jun 2018 15:28:12 GMT.
sesam commented 6 years ago

@baryluk your issue might be about some other aspect of sync, possibly rather about fast sync not kicking in.

Here are some other known bugs that might be causing your trouble: https://github.com/mimblewimble/grin/issues/973 https://github.com/mimblewimble/grin/issues/957 https://github.com/mimblewimble/grin/issues/951 https://github.com/mimblewimble/grin/issues/895

But if its just that you're not getting the fast sync, try like this:

  1. start grin again in a new empty folder
  2. try again a couple of times at different times of day, until you get the fast sync-part of syncing to go through.

or wait for testnet3 to start :)

garyyu commented 6 years ago

Some related fixes on testnet3: #1281 and #1285. And related issue: #1271 . Now both the full-sync and fast-sync are very fast and stable :)

I close this old issue which submitted on testnet2.