stratisproject / StratisBitcoinFullNode

Bitcoin full node in C#
https://stratisplatform.com
MIT License
788 stars 312 forks source link

SBFN PH: StratisX nodes are not syncing from SBFN nodes #2842

Closed fshutdown closed 5 years ago

fshutdown commented 5 years ago

This test is based on the latest network topology: https://github.com/maciejzaleski/InternalTestnet/blob/master/Documentation/FullNode/InternalTestnet-NetworkDesign.draw.io.svg

Setup

Logs: https://stratisplatform-my.sharepoint.com/:u:/p/maciej_zaleski/EcMzTK1Y_dVCtwTpkxGH96EB0wn0YJJWj_bK-huvz8Khfg?e=de7g7L

Network View image

noescape00 commented 5 years ago

I'm using following setup:

C# node (with staking enabled, connected to the network) -- QT node (connected only to C# node)

During IBD QT stopped advancing.

So the problem is reproducible, will be looking into why this is happening.

noescape00 commented 5 years ago

I was able to reproduce a bug several times but it never lead to QT node being stuck for longer than 3-5 minutes.

When QT syncs from C# node it can occasionally drop the connection Unable to read data from the transport connection: An existing connection was forcibly closed by the remote host. ---> System.Net.Sockets.SocketException: An existing connection was forcibly closed by the remote host

after which it would reconnect and it would take some time to start syncing again.

I've also checked logs provided by @maciejzaleski and didn't find there a disconnection which was a precursor to the situation I found somewhat problematic.

However there might be a problem in the setup that might have caused the bug- bidirectional connections.

HA:

Peer:[::ffff:192.168.98.101]:37021,      connected:inbound,         height:1241,          agent:StratisBitcoin:1.2.5
Peer:[::ffff:192.168.98.101]:16178,      connected:outbound,        height:1241,          agent:StratisBitcoin:1.2.5
Peer:[::ffff:192.168.98.171]:43076,      connected:inbound,         height:0,             agent:/Stratis:2.0.0.5/

HBX:

receive version message: version 70012, blocks=0, us=192.168.98.171:16178, them=127.0.0.1:16178, peer=192.168.98.174:40377
Added time data, samples 2, offset +0 (+0 minutes)
keypool reserve 1
keypool return 1
keypool reserve 1
keypool return 1
receive version message: version 70012, blocks=0, us=192.168.98.171:16178, them=127.0.0.1:16178, peer=192.168.98.174:39649
keypool reserve 1
keypool return 1
keypool reserve 1
keypool return 1
keypool reserve 1
keypool return 1
Adding fixed seed nodes as DNS doesn't seem to be available.
keypool reserve 1
keypool return 1
keypool reserve 1
keypool return 1
keypool reserve 1
keypool return 1
receive version message: version 70012, blocks=0, us=192.168.98.171:16178, them=127.0.0.1:16178, peer=192.168.98.174:38507
keypool reserve 1
keypool return 1
keypool reserve 1
keypool return 1
keypool reserve 1
keypool return 1
keypool reserve 1
keypool return 1
receive version message: version 70012, blocks=0, us=192.168.98.171:43076, them=127.0.0.1:16178, peer=192.168.98.170:16178

@maciejzaleski could you please run the network on latest master ensuring that there are no bidirectional connections and check if the bug still appears?

fshutdown commented 5 years ago

Logs https://stratisplatform-my.sharepoint.com/:u:/p/maciej_zaleski/EbR7WtCPYppKnk-PS4QjxBIBuohZFrXR5fzHwlb2qHsbRA?e=tpoTca

Network State image

SBFN version

58d3d1dd (01 Dec @ 13:54); Patches: Kevin 

*** Build date: 01/12/2018 19:56:38.64 

*** Recent commits 
*    58d3d1dd - (HEAD -> master, origin/master, origin/HEAD) Fix consecutive header bug (#2865) (Sat, 1 Dec 2018 13:54:20 +0000) <Francois de la Rouviere>
*    700c871a - Correctly store and load rewind data index,  (#2875) (Fri, 30 Nov 2018 23:46:54 +0000) <Dan Gershony>
*    c63e0c36 - Fix for "PHBS can serve reorged headers" (#2876) (Fri, 30 Nov 2018 18:13:46 +0300) <noescape0>
*    4d05de89 - Update SeedData (#2872) (Fri, 30 Nov 2018 10:04:03 +0000) <StratisIain>
*    08dc1140 - (origin/sc/v0.12.0-beta) nameof variables fix (#2868) (Thu, 29 Nov 2018 16:48:10 +0000) <Fazz>
*    0403c3cf - Inbound/outbound peer connection count (#2866) (Thu, 29 Nov 2018 15:49:19 +0000) <Fazz>
*    a981244c - (R/C) tip height, refactoring, IBD status (#2856) (Thu, 29 Nov 2018 13:45:13 +0000) <Fazz>
*    a1f70708 - Fix two tests (#2859) (Thu, 29 Nov 2018 10:32:25 +0000) <Francois de la Rouviere>
*    4c308677 - Add default connection params to SC networks (#2858) (Thu, 29 Nov 2018 17:45:09 +1100) <Rowan de Haas>
*    914e2549 - [ProvenHeaders] Set PHBS tip to ChainTip if its ahead (#2853) (Wed, 28 Nov 2018 18:56:18 +0000) <Francois de la Rouviere>  
****************** 
noescape00 commented 5 years ago

Based on info received from @maciejzaleski I'm now trying another setup to reproduce the bug:

NETWORK --> C# A (fully synced) --> C# B --> QT

C# B and QT node is syncing from scratch C# B has IBD always disabled

Expected outcome: QT node gets stuck and can't sync fully.

noescape00 commented 5 years ago

This one is fixed by #2893

@fassadlr why reopened?

fassadlr commented 5 years ago

@noescape00 we can't close it until the testers has retested it... It is currently in the Re-Test column :)

noescape00 commented 5 years ago

Fair point 👍

fshutdown commented 5 years ago

Retest failed, logs: https://stratisplatform-my.sharepoint.com/:u:/p/maciej_zaleski/Ee6crzATL1BLm0ZMeZV6IywBVsRW-qLpnmh9FIRlnMiO2w?e=OUXp7V

image

Code version

9535e61e (06 Dec @ 15:32); Patches: Kevin 

*** Build date: 06/12/2018 12:34:06.68 

*** Recent commits 
*    9535e61e - (HEAD -> master, origin/master, origin/HEAD) syncing speedup (#2901) (Thu, 6 Dec 2018 15:32:25 +0300) <noescape0>
*    63b750a3 - Fix DoS vector (#2904) (Thu, 6 Dec 2018 12:53:36 +0300) <noescape0>
*    3719dab9 - Set the transaction fee to the network defined minimum fee (#2907) (Thu, 6 Dec 2018 08:05:27 +0000) <Jeremy Bokobza>
*    e2af0c02 - (origin/sc/v0.13.0-beta) Add version to network folder name (#2915) (Thu, 6 Dec 2018 15:11:18 +1100) <Rowan de Haas>
*    f4a8ec92 - Increment magic and nonce (#2914) (Thu, 6 Dec 2018 15:05:31 +1100) <Rowan de Haas>
*    0c313825 - Increment SC version (#2911) (Thu, 6 Dec 2018 12:20:01 +1100) <Rowan de Haas>
*    0f93ca37 - RuntimeObserver -> Stratis.SmartContracts.RuntimeObserver (#2912) (Thu, 6 Dec 2018 12:13:24 +1100) <Jordan Andrews>
*    5c8c08f9 - Powershell NuGet fix + version updates (#2900) (Thu, 6 Dec 2018 10:41:10 +1100) <Jordan Andrews>
*    a6a8d2b7 - Added custom path for KeyTool (#2902) (Wed, 5 Dec 2018 17:05:43 +0000) <Jeremy Bokobza>
*    528337cc - Missing inputs error was not reporting correctly the error (#2906) (Wed, 5 Dec 2018 16:53:23 +0000) <Dan Gershony>  
****************** 
fshutdown commented 5 years ago

You could probably reproduce that by syncing StratisX, shutting it down for time which is shorter that the time we have in the IBD check method (around 100 blocks) and than starting StratisX again so that it can resync

fshutdown commented 5 years ago

The issue has been resolved