libbitcoin / libbitcoin-server

Bitcoin Full Node and Query Server
Other
151 stars 65 forks source link

OSX testnet: merkle root mismatch - height 508082 #134

Closed skaht closed 8 years ago

skaht commented 8 years ago

For commit d0eff8a6454e270dee97928764d772295b789e71, rebuilding the blockchain from scratch.

Stopping and re-starting bitcoin-server doesn't overcome a merkle root issue. Any means to rollback a local blockchain to an earlier height without having untar an older image of the blockchain directory?

 3286 01:17:17.397347 ERROR [protocol] Failure receiving verack from [[::]] channel is stopped
 3287 01:17:53.746677 WARNING [validate] Invalid block [000000000007cb8945a2320c9d4c799eb5ac6fad78c3bd7f318a81580372c35c] merkle root mismatch
 3288 01:17:53.748159 ERROR [poller] Error storing block [000000000007cb8945a2320c9d4c799eb5ac6fad78c3bd7f318a81580372c35c] merkle root mismatch

https://sandbox.coinbase.com/network/blocks/000000000007cb8945a2320c9d4c799eb5ac6fad78c3bd7f318a81580372c35c

The problem repeated after rolling back my blockchain directory back around 12K blocks from an older tar imager

thecodefactory commented 8 years ago

Confirmed on GNU/Linux GCC.

evoskuil commented 8 years ago

The merkle root is generated in libbitcoin and tested in libbitcoin-blockchain. Which branch/commit are these built from?

thecodefactory commented 8 years ago

I used and tested master:

libbitcoin: 4d588d4874e14cdd03e5595ba28cd139f25a2fc1
libbitcoin-blockchain: 0dc11e205c7eb7b9ffd24fd78af00ed41dff08e0

In short, my testnet server completed a sync starting from block 446953 until it reached block 508081(last good block height). Now on startup it repeatedly shows:

ERROR [poller] Error storing block [000000000007cb8945a2320c9d4c799eb5ac6fad78c3bd7f318a81580372c35c] merkle root mismatch
evoskuil commented 8 years ago

Okay, I don't see any obvious causes or explanation for why there would be an issue with that block after over 500k good checks. I'm in the process of working through the network stack, so may take me some time to get a repro on this.

thecodefactory commented 8 years ago

Agreed. This particular block has 73 txs in it and on inspection, they do appear to differ from what I see was accepted via a testnet explorer. It's almost as if some bad blocks are being propagated and we can't move past them to get the proper/good ones. Just a theory, looking into off and on as I can.

evoskuil commented 8 years ago

In this case we should eventually see the valid block and easily move past it. The only way this might not happen, assuming we are getting the valid data, is that the orphan pool size is too small to reorg onto a stronger fork containing the valid blocks. But this implies you have build up a fairly long weaker fork above the fork point. Seems unlikely.

thecodefactory commented 8 years ago

Also agreed. Not sure why else this is the case though. We appear to be receiving multiple copies of this block all with the same header merkle, and the same tx list which doesn't match the block explorer list -- and so the mismatch on computed merkle root. I'm at a bit of a loss at the moment. I was hoping it was a bug in the computation, but it's straight-forward and looks correct on review, which was why I checked each tx hash to verify inputs were the same and found the mismatch. Any other ideas you can think of?

evoskuil commented 8 years ago

A regression in our tx hash generation seems like the next place to look.

skaht commented 8 years ago

Can recreate issue very quickly for Block 508082. (Have an un-corrupted testnet blockchain starting at Block 507972. ) Upping the following in the config file makes no difference for the latest master build from yesterday.

network.host_pool_capacity = 5000
node.transaction_pool_capacity = 8000
evoskuil commented 8 years ago

If there was a pool size issue it would show in the logs, but if you want to expand the block orphan pool you need this:

[blockchain]
block_pool_capacity = 100
skaht commented 8 years ago

For:

blockchain.block_pool_capacity = 500

Still run into this:

14:39:29.315334 INFO [poller] Block #508078 0000000002c0a03bb4d0e86fde3a970278457f5e1947ed32f72d94b28d3c14e2
14:39:29.321844 INFO [poller] Block #508079 0000000000000b7cd4cbeb97b3d4c8684f4a2f2e969dc8fa69746ac33ce0e72c
14:39:29.334016 ERROR [poller] Error storing block [000000000195af8dfc54dfd24e5370e030c6c51bc8837d69cc50bd7c47abcb1e] previous block failed to validate
14:39:29.340032 ERROR [poller] Error storing block [000000000007cb8945a2320c9d4c799eb5ac6fad78c3bd7f318a81580372c35c] merkle root mismatch
14:39:30.617506 ERROR [poller] Error storing block [000000000007cb8945a2320c9d4c799eb5ac6fad78c3bd7f318a81580372c35c] merkle root mismatch
evoskuil commented 8 years ago

mainnet reproduces at block 32652 (master)

00:16:38.747398 INFO [poller] Block #322541 000000000000000004f16f65eb97f6d1a5269be2b06a53a8cf2945af6baddbad
00:16:40.530504 INFO [poller] Block #322554 00000000000000001d9b4a7ec4e0ee92843614f6f79ce9a5bd168c17f0cd74a3
00:16:41.084537 INFO [poller] Block #322560 000000000000000002df2dd9d4fe0578392e519610e341dd09025469f101cfa1
00:16:50.827114 INFO [poller] Block #322651 000000000000000003254d09cce46cb28442f36e331b43b6008208686930ec13
00:16:54.025303 ERROR [poller] Error storing block [00000000000000000c601eb4dcbe3870de246ba8f352e044b690cb53d9479067] from [[2001:4800:7819:104:be76:4eff:fe05:c9a0]:8333] previous block failed to validate
00:16:59.979656 ERROR [poller] Error storing block [00000000000000001db1c8ff51a040d517d88468148f1d07d84e4ec2fbeb1e26] from [[2400:8900::f03c:91ff:fe6e:823e]:8333] merkle root mismatch
00:17:04.299912 INFO [network] Connected to outbound channel [108.61.190.77:8333]
00:17:05.100959 INFO [network] Connected to outbound channel [65.26.30.171:8333]
00:17:06.483041 ERROR [poller] Error storing block [00000000000000001db1c8ff51a040d517d88468148f1d07d84e4ec2fbeb1e26] from [108.61.190.77:8333] merkle root mismatch
00:17:07.171082 ERROR [poller] Error storing block [00000000000000001db1c8ff51a040d517d88468148f1d07d84e4ec2fbeb1e26] from [65.26.30.171:8333] merkle root mismatch
00:17:12.356389 INFO [network] Connected to outbound channel [51.254.71.147:8333]
00:17:16.267621 ERROR [poller] Error storing block [00000000000000001db1c8ff51a040d517d88468148f1d07d84e4ec2fbeb1e26] from [51.254.71.147:8333] merkle root mismatch
00:17:46.370439 INFO [network] Connected to outbound channel [24.125.54.27:8333]
00:17:46.702464 INFO [network] Connected to outbound channel [83.233.54.154:8221]
00:17:49.743639 ERROR [poller] Error storing block [00000000000000001db1c8ff51a040d517d88468148f1d07d84e4ec2fbeb1e26] from [24.125.54.27:8333] merkle root mismatch
00:17:51.115726 ERROR [poller] Error storing block [00000000000000001db1c8ff51a040d517d88468148f1d07d84e4ec2fbeb1e26] from [83.233.54.154:8221] merkle root mismatch
00:18:08.038723 INFO [network] Connected to outbound channel [94.112.102.36:8333]
00:18:11.313922 ERROR [poller] Error storing block [00000000000000001db1c8ff51a040d517d88468148f1d07d84e4ec2fbeb1e26] from [94.112.102.36:8333] merkle root mismatch
00:18:24.398691 ERROR [poller] Error storing block [00000000000000001db1c8ff51a040d517d88468148f1d07d84e4ec2fbeb1e26] from [[2001:470:1f0b:ad6::2]:8333] merkle root mismatch
evoskuil commented 8 years ago

https://github.com/libbitcoin/libbitcoin/pull/335 doesn't resolve the above issue on mainnet.

skaht commented 8 years ago

Issue not resolved at this end with a preliminary non-production build with a to fix libbitcoin-node commit cfa23214906172df9bb5f569b0352f1582ea1005 to file responder.cpp documented at https://github.com/libbitcoin/libbitcoin-server/issues/136.

thecodefactory commented 8 years ago

I can test on mainnet -- but just to clarify, you're both testing @thecodefactory/libbitcoin, correct? The PR was not merged before the large merge to master from @evoskuil

If so, might also be a good idea to test master from all of the trees before that large merge in case another regression was added despite that fix.

skaht commented 8 years ago

That thought went through my mind after my last post. Looked at the postings at https://github.com/thecodefactory and noticed they pointed me back to Eric's distribution. Point me to your distributions and I'll give them a test drive.

thecodefactory commented 8 years ago

The code is here: https://github.com/thecodefactory/libbitcoin/tree/bugfix1

But it's not that simple since I rebased all dependent trees against master after the merge from @evoskuil

So in other words, it can't work as intended any longer. I can try to track down what the state of the trees looked like before the merge because I can confirm that worked here. I'd say to hang tight for a minute while I track that down.

skaht commented 8 years ago

Still a github newbie. Was using bb9c7aeacba81e3ed869d89d1a2352b6de57b786 for libbitcoin. Noticed that src/chain/operation.cpp did not match the changes at https://github.com/thecodefactory/libbitcoin/commit/a214f9549b3445d11baa92a951f7d4c1a8f6bafb.

skaht commented 8 years ago

Will apply https://github.com/thecodefactory/libbitcoin/blob/a214f9549b3445d11baa92a951f7d4c1a8f6bafb/src/chain/operation.cpp that bb9c7aeacba81e3ed869d89d1a2352b6de57b786 appears to be missing.

thecodefactory commented 8 years ago

I wouldn't recommend that -- let me find the correct version.

skaht commented 8 years ago

You are correct... The testsuite failed after I updated the operation.cpp file. Greatly appreciate your assistance.

thecodefactory commented 8 years ago

Better yet, I think I see the issue and want to test it against the latest master. There is a bug in the PR, so best it wasn't merged. Will try to rebase and test against current master.

thecodefactory commented 8 years ago

New PR issued: https://github.com/libbitcoin/libbitcoin/pull/337

If you build cleanly using the install.sh from libbitcoin-server/master, just change the line:

build_from_github libbitcoin libbitcoin master $PARALLEL "$@" $BITCOIN_OPTIONS

to:

build_from_github thecodefactory libbitcoin master $PARALLEL "$@" $BITCOIN_OPTIONS
skaht commented 8 years ago

Have a OSX change to libbitcoin-node that must be supported at this end, can't use the install.sh. Performed a % git clone --branch master --single-branch https://github.com/thecodefactory/libbitcoin and currently building with custom scripts that I have a reasonable amount faith in.

thecodefactory commented 8 years ago

Ah, ok. I think that should work then as long as the other trees are mostly resembling master across the board.

skaht commented 8 years ago

The other trees match, except libbitcoin-node that has a tweak. However, the Testsuite just failed for libbitcoin. Just built again without the make check.

thecodefactory commented 8 years ago

Haven't tried the tests since the merge (didn't build with tests enabled).

skaht commented 8 years ago

Dude... It seems to be functioning:-)

22:55:53.743701 INFO [poller] Block #508502 0000000002372bd3e6abb5ceaddf6f9dee7a34ccf6c50a25c3c9b4f740002fa6

thecodefactory commented 8 years ago

Glad to hear!

skaht commented 8 years ago

Your da man!

thecodefactory commented 8 years ago

It was actually a regression that was added (i.e. it appears it used to work). Of course, it took me a long time to find out what it was exactly ... but hey, at least it's isolated now ;-)

skaht commented 8 years ago

CNTL-C also seems to work properly:-) Need to to see if the chain reaches around block 604K overnight.

thecodefactory commented 8 years ago

I recommend adding a checkpoint to your etc/bs.cfg:

checkpoint = 000000000000624f06c69d3a9fe8d25e0a9030569128d63ad1b704bbb3059a16:600000
skaht commented 8 years ago

Will add that checkpoint to my bs-testnet.cfg file after the block height reaches 600K. Any config file tweaks that can accelerate the chain building process that you recommend?

thecodefactory commented 8 years ago

If you add that checkpoint now, it will skip the long validation until it reaches 600K (i.e. speed up the process).

skaht commented 8 years ago

Thought the checkpoints only speed up the server booting process by short-circuiting the integrity validation of the blockchain by establishing jump points for trust.

Already have the following checkpoints added:

checkpoint = 000000000019d6689c085ae165831e934ff763ae46a2a6c172b3f1b60a8ce26f:0
checkpoint = 00000000009e2958c15ff9290d571bf9459e93b19765c6801ddeccadbb160a1e:100000
checkpoint = 0000000000287bffd321963ef05feab753ebe274e1d78b2fd4e2bfe9ad3aa6f2:200000
checkpoint = 000000000000226f7618566e70a2b5e020e29579b46743f05348427239bf41a1:300000
checkpoint = 000000000598cbbb1e79057b79eef828c495d4fc31050e6b179c57d07d00367c:400000
checkpoint = 000000000001a7c0aaa2630fbb2c0e476aafffc60f82177375b2aaa22209f606:500000
checkpoint = 0000000000000269b72c62fb7517dd489a3069e63d3d154ed453a9f3664214e4:531960
checkpoint = 000000000000242db10b3171738a11367742bb894559dc420e35921b29cdafa9:579304
thecodefactory commented 8 years ago

Yep, that's correct. If you leave it overnight, yours will do longer validation checks after 579304 (instead of 600k as provided). Either way. It was just a recommendation to speed things up.

skaht commented 8 years ago

Started developing a list for network magic numbers in hex to eventually convert to decimal. Here is what I have so far:

# Magic value, Magic Number, Network ID, P2P_PREFIX
# unsigned char pchMessageStart[4] = { 0xf9, 0xbe, 0xb4, 0xd9 }; // d9b4bef9
# main             0xD9B4BEF9  https://github.com/bitcoin/bitcoin/blob/master/src/chainparams.cpp#L87
# testnet          0xDAB5BFFA  https://github.com/bitcoin/bitcoin/blob/master/src/chainparams.cpp#L221
# testnet3         0x0709110B  https://github.com/bitcoin/bitcoin/blob/master/src/chainparams.cpp#L158
# LTC  mainnet     0xDBB6C0FB  https://github.com/litecoin-project/litecoin/blob/master-0.10/src/chainparams.cpp#L117
# LTC  testnet     0xDCB7C1FC  https://github.com/litecoin-project/litecoin/blob/master-0.10/src/chainparams.cpp#L117
# RDD  mainnet     0xDBB6C0FB  https://github.com/reddcoin-project/reddcoin-seeder/blob/master/protocol.cpp#L25 unsigned char pchMessageStart[4] = { 0xfb, 0xc0, 0xb6, 0xdb };
# DOGE mainnet     0xC0C0C0C0  https://github.com/dogecoin/dogecoin/blob/master/src/chainparams.cpp#L92
# Dash mainnet     0xBD6B0CBF  https://github.com/dashpay/dash/blob/master/src/chainparams.cpp#L122
# PPC  mainnet     
# NMC  mainnet     0xFEB4BEF9  https://github.com/domob1812/namecore/blob/master/src/chainparams.cpp#L117
# FTC  mainnet       fcd9b7dd  http://forum.feathercoin.com/topic/7084/ufo-coin-relaunched-with-some-help-from-bushstar/13
# BLK  mainnet     0x05223570  https://github.com/rat4/blackcoin/blob/master/src/chainparams.cpp#L54
evoskuil commented 8 years ago

Cool. It will be interesting to see how much the protocols differ at the network level (ie excepting block and tx messages).

libbitcoin::network::p2p seeds, connects, accepts and maintains connections using version, address and ping protocols alone. Changing the magic number via config should aow this to work with any Bitcoin-based network.

I'm presently working on libbitcoin-node, injecting block and tx protocols, and implementing an additional session for initial block sync - headers first, of course.

skaht commented 8 years ago

The testnet blockchain finished building at this end. Running the sever behind NAPT firewall without port forwarding enabled.

18:17:01.019045 INFO [poller] Block #604601 0000000000e4a878ebe0e2735c869d2c67f4e03005dfd57bf572ce7d527f40c2

Server seems much more stable. No hanging or crashes to report. Will start examining local bx & bs interactions.

thecodefactory commented 8 years ago

Resolved by PR libbitcoin/libbitcoin#337