mimblewimble / grin

Minimal implementation of the Mimblewimble protocol.
https://grin.mw/
Apache License 2.0
5.04k stars 989 forks source link

miner loop panic - panicked at Error setting sum tree roots: DuplicateCommitment #259

Closed antiochp closed 6 years ago

antiochp commented 6 years ago

I had a miner fall over with the following error. Not sure what actually caused this - I think I had recently sent a txn, but also possibly restarted this mining node.

Nov 12 18:14:25.168 DEBG in miner loop...
Nov 12 18:14:25.204 DEBG block_fees here: BlockFees { fees: 0, height: 2019, key_id: Some(Identifier(64e18e80b902e1ab0772)) }
Nov 12 18:14:25.204 DEBG (Server ID: Port 13414) Built new block with 0 inputs and 1 outputs, difficulty: 17
Nov 12 18:14:25.258 DEBG Starting new sumtree extension.
Nov 12 18:14:25.259 DEBG Error returned, discarding sumtree extension.
thread '<unnamed>' panicked at 'Error setting sum tree roots: DuplicateCommitment(Commitment(093404f806a567805295040ff9f2de28db54405b1ecbd89449b17654c4f6ddfb87))', /checkout/src/libcore/result.rs:906:4
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
stack backtrace:
   0: std::sys::imp::backtrace::tracing::imp::unwind_backtrace
             at /checkout/src/libstd/sys/unix/backtrace/tracing/gcc_s.rs:49
   1: std::sys_common::backtrace::_print
             at /checkout/src/libstd/sys_common/backtrace.rs:71
   2: std::panicking::default_hook::{{closure}}
             at /checkout/src/libstd/sys_common/backtrace.rs:60
             at /checkout/src/libstd/panicking.rs:381
   3: std::panicking::default_hook
             at /checkout/src/libstd/panicking.rs:397
   4: std::panicking::rust_panic_with_hook
             at /checkout/src/libstd/panicking.rs:611
   5: std::panicking::begin_panic
             at /checkout/src/libstd/panicking.rs:572
   6: std::panicking::begin_panic_fmt
             at /checkout/src/libstd/panicking.rs:522
   7: rust_begin_unwind
             at /checkout/src/libstd/panicking.rs:498
   8: core::panicking::panic_fmt
             at /checkout/src/libcore/panicking.rs:71
   9: core::result::unwrap_failed
  10: grin_grin::miner::Miner::run_loop
antiochp commented 6 years ago

I think this is explained by - 1) the wallet failing somehow (I think a large tx with many inputs was involved) 2) the wallet failed to store the coinbase output locally 3) subsequent block mined attempted to reuse this coinbase output from the wallet

yeastplume commented 6 years ago

Figured out how to reproduce this very easily.

-Start new blockchain -Start new wallet.dat -Start waller receiver -Note 'unconfirmed' potential coinbase in wallet -Run miner -Delete wallet.dat -Let miner find block -Duplicate commitment error, as described -Note output with same key appears in wallet, except showing up as 'unspent', even though it should be 'unconfirmed'.

Stopping and restarting grin sometimes gives the same error again, but works after a few attempts

yeastplume commented 6 years ago

Okay, 'twould appear that the root of this is within in the next_available_key method in the wallet receiver.rs. If you delete the wallet contents.. (or presumably switch to a new wallet.dat file with the same seed after successfully mining a block), the coinbase output gets added as n_child=1.. since the child count is reset, you end up using the same derivation and hence end up with a duplicate key and duplicate coinbase commitment (same amount, same blinding factor), which is rejected by the sumtree...

In other words, if for some reason the next child in your wallet corresponds to an output that's already been accepted by the sumtree (because you've deleted the original potential coinbase or for some reason it was never written to wallet.dat), you won't be able to append the next coinbase transaction to the chain (if you win it), because the same commitment already exists.

thinking about how this should be fixed now.

ignopeverell commented 6 years ago

Right, deleting wallet.dat is not really fair game ;-) I think this should require a restore step, where the wallet scans the chain for existing outputs, presumably using the hashed switch commitments. I think @antiochp had something like that on his list.

That being said, maybe step 1 would be a nicer error message in this case. The miner could maybe check that the coinbase returned by the wallet doesn't exist yet?

antiochp commented 6 years ago

That would definitely do it - the wallet has no way of knowing (today) if any keys have previously been used outside of this specific wallet instance. I think I have seen this without deleting wallet.dat though (which is more subtle).

yeastplume commented 6 years ago

All's fair in love and QA :P But I think the easiest way to fix this for now is as you say, instead of vomiting just reject the coinbase in the miner and move on to the next one... and hopefully switch commitment reconstruction will fix the underlying issue.

yeastplume commented 6 years ago

https://github.com/mimblewimble/grin/pull/266 attempts to move on to the next ID if you delete your wallet.dat file, until it finds a key id that creates a commit that's not already in the sum tree.

Actually just realised it's trivial to restore your coinbase transactions for a given wallet seed from the chain because of this.

ignopeverell commented 6 years ago

Closing this as the wallet restore is tracked separately.