mimblewimble / grin

Minimal implementation of the Mimblewimble protocol.
https://grin.mw/
Apache License 2.0
5.04k stars 989 forks source link

panic - index out of range for slice #444

Closed antiochp closed 6 years ago

antiochp commented 6 years ago

2 nodes both failed overnight with almost identical backtraces. Both had the same index and length numbers - index 2529085 out of range for slice of length 2524600.

Backtrace doesn't have line numbers... will see if I can reproduce with better backtrace.

thread 'main' panicked at 'index 2529085 out of range for slice of length 2524600', /checkout/src/libcore/slice/mod.rs:748:4
stack backtrace:
   0: std::sys::imp::backtrace::tracing::imp::unwind_backtrace
             at /checkout/src/libstd/sys/unix/backtrace/tracing/gcc_s.rs:49
   1: std::sys_common::backtrace::_print
             at /checkout/src/libstd/sys_common/backtrace.rs:71
   2: std::panicking::default_hook::{{closure}}
             at /checkout/src/libstd/sys_common/backtrace.rs:60
             at /checkout/src/libstd/panicking.rs:381
   3: std::panicking::default_hook
             at /checkout/src/libstd/panicking.rs:397
   4: std::panicking::rust_panic_with_hook
             at /checkout/src/libstd/panicking.rs:611
   5: std::panicking::begin_panic
             at /checkout/src/libstd/panicking.rs:572
6: std::panicking::begin_panic_fmt                                                                                     [11/1871]
             at /checkout/src/libstd/panicking.rs:522
   7: rust_begin_unwind
             at /checkout/src/libstd/panicking.rs:498
   8: core::panicking::panic_fmt
             at /checkout/src/libcore/panicking.rs:71
   9: core::slice::slice_index_len_fail
             at /checkout/src/libcore/slice/mod.rs:748
  10: grin_store::sumtree::AppendOnlyFile::read
  11: <grin_store::sumtree::PMMRBackend<T> as grin_core::core::pmmr::Backend<T>>::get
  12: <grin_core::core::pmmr::PMMR<'a, T, B>>::root
  13: grin_chain::pipe::process_block
  14: grin_chain::chain::Chain::process_block
  15: <grin_grin::adapters::NetToChainAdapter as grin_p2p::types::NetAdapter>::block_received
  16: <grin_p2p::peer::TrackingAdapter as grin_p2p::types::NetAdapter>::block_received
  17: <futures::future::chain::Chain<A, B, C>>::poll
  18: <futures::stream::fold::Fold<S, F, Fut, T> as futures::future::Future>::poll
  19: <futures::future::map_err::MapErr<A, F> as futures::future::Future>::poll
  20: <futures::future::map_err::MapErr<A, F> as futures::future::Future>::poll
  21: <futures::future::then::Then<A, B, F> as futures::future::Future>::poll
  22: <futures::future::and_then::AndThen<A, B, F> as futures::future::Future>::poll
  23: <futures::future::then::Then<A, B, F> as futures::future::Future>::poll
  24: tokio_core::reactor::Core::poll
  25: grin_grin::server::Server::start
  26: grin::main
  27: __rust_maybe_catch_panic
             at /checkout/src/libpanic_unwind/lib.rs:99
  28: std::rt::lang_start
             at /checkout/src/libstd/panicking.rs:459
             at /checkout/src/libstd/panic.rs:361
             at /checkout/src/libstd/rt.rs:61
  29: __libc_start_main
  30: _start
ignopeverell commented 6 years ago

Here it is, will look at it.

thread 'main' panicked at 'index 2529085 out of range for slice of length 2524600', /checkout/src/libcore/slice/mod.rs:748:4
stack backtrace:                                                             
   0: std::sys::imp::backtrace::tracing::imp::unwind_backtrace                                            
             at /checkout/src/libstd/sys/unix/backtrace/tracing/gcc_s.rs:49                  
   1: std::sys_common::backtrace::_print                                                                  
             at /checkout/src/libstd/sys_common/backtrace.rs:71                       
   2: std::panicking::default_hook::{{closure}}                                                            
             at /checkout/src/libstd/sys_common/backtrace.rs:60                     
             at /checkout/src/libstd/panicking.rs:381                                                         
   3: std::panicking::default_hook                                                      
             at /checkout/src/libstd/panicking.rs:397                                                      
   4: std::panicking::rust_panic_with_hook                                          
             at /checkout/src/libstd/panicking.rs:611                                                         
   5: std::panicking::begin_panic                           
             at /checkout/src/libstd/panicking.rs:572                                                      
   6: std::panicking::begin_panic_fmt                                        
             at /checkout/src/libstd/panicking.rs:522                                                     
   7: rust_begin_unwind                                             
             at /checkout/src/libstd/panicking.rs:498                                                     
   8: core::panicking::panic_fmt                                                                              
             at /checkout/src/libcore/panicking.rs:71                                                         
   9: core::slice::slice_index_len_fail                 
             at /checkout/src/libcore/slice/mod.rs:748                                                       
  10: <core::ops::range::Range<usize> as core::slice::SliceIndex<[T]>>::index                    
             at /checkout/src/libcore/slice/mod.rs:879                                                          
  11: core::slice::<impl core::ops::index::Index<I> for [T]>::index
             at /checkout/src/libcore/slice/mod.rs:730                                                       
  12: grin_store::sumtree::AppendOnlyFile::read                                     
             at store/src/sumtree.rs:96                                                                       
  13: <grin_store::sumtree::PMMRBackend<T> as grin_core::core::pmmr::Backend<T>>::get
             at /root/grin/store/src/sumtree.rs:338                                                           
  14: <grin_core::core::pmmr::PMMR<'a, T, B>>::root::{{closure}}
             at /root/grin/core/src/core/pmmr.rs:247                                                      
  15: core::ops::function::impls::<impl core::ops::function::FnOnce<A> for &'a mut F>::call_once
             at /checkout/src/libcore/ops/function.rs:271                                                     
  16: <core::option::Option<T>>::map                                          
             at /checkout/src/libcore/option.rs:398                                                           
  17: <core::iter::Map<I, F> as core::iter::iterator::Iterator>::next   
             at /checkout/src/libcore/iter/mod.rs:1073                                                        
  18: <alloc::vec::Vec<T> as alloc::vec::SpecExtend<T, I>>::spec_extend        
             at /checkout/src/liballoc/vec.rs:1825                                                            
  19: <alloc::vec::Vec<T> as alloc::vec::SpecExtend<T, I>>::from_iter         
             at /checkout/src/liballoc/vec.rs:1808                                                          
  20: <alloc::vec::Vec<T> as core::iter::traits::FromIterator<T>>::from_iter
             at /checkout/src/liballoc/vec.rs:1695                                                       
  21: core::iter::iterator::Iterator::collect                                  
             at /checkout/src/libcore/iter/iterator.rs:1298                                                  
  22: <grin_core::core::pmmr::PMMR<'a, T, B>>::root          
             at /root/grin/core/src/macros.rs:23                                                          
  23: grin_chain::sumtree::Extension::roots                                   
             at chain/src/sumtree.rs:388                                                                    
  24: grin_chain::pipe::validate_block                                  
             at chain/src/pipe.rs:288                                                                    
  25: grin_chain::pipe::process_block::{{closure}}                             
             at chain/src/pipe.rs:93                                                                         
  26: grin_chain::sumtree::extending                         
             at chain/src/sumtree.rs:160                                                                  
  27: grin_chain::pipe::process_block               
             at chain/src/pipe.rs:92                                                                                        
  28: grin_chain::chain::Chain::process_block                                
             at chain/src/chain.rs:150                                                                    
  29: <grin_grin::adapters::NetToChainAdapter as grin_p2p::types::NetAdapter>::block_received
             at grin/src/adapters.rs:78                                                                   
  30: <grin_p2p::peer::TrackingAdapter as grin_p2p::types::NetAdapter>::block_received
             at p2p/src/peer.rs:240                                                                        
  31: grin_p2p::protocol::handle_payload                                            
             at p2p/src/protocol.rs:192                                                                       
  32: <grin_p2p::protocol::ProtocolV1 as grin_p2p::types::Protocol>::handle::{{closure}}
             at p2p/src/protocol.rs:55                                                                     
  33: <F as grin_p2p::conn::Handler>::handle                                        
             at p2p/src/conn.rs:66                                                                            
  34: grin_p2p::conn::TimeoutConnection::listen::{{closure}}
             at p2p/src/conn.rs:275                                                                        
  35: <F as grin_p2p::conn::Handler>::handle                                 
             at p2p/src/conn.rs:66                                                                        
  36: grin_p2p::conn::Connection::read_msg::{{closure}}::{{closure}}
             at p2p/src/conn.rs:215                                                                       
ignopeverell commented 6 years ago

Digging deeper on this, @antiochp had the following logs before the crash:

Dec 07 20:16:02.899 DEBG Received block a01fb4e3 at 17048 from network, going to process.
Dec 07 20:16:02.899 DEBG pipe: process_block a01fb4e3 at 17048 with 0 inputs and 1 outputs.
Dec 07 20:16:02.900 DEBG pipe: validate_header cuckoo size 16
Dec 07 20:16:02.901 DEBG Starting new sumtree extension.
Dec 07 20:16:02.947 DEBG validate_block: forked_block: 13a32669 at 17047
Dec 07 20:16:02.947 DEBG Rewind sumtrees to header 13a32669 at 17047
Dec 07 20:16:02.948 DEBG Rewind sumtrees to output pos: 38914, kernel pos: 36529

Which is odd because a rewind at 38914 means offset 2,529,410 in the UTXO MMR data file, therefore index 2,529,085 should have been within bounds.

Mine were the following:

Dec 07 16:28:41.449 DEBG Received block 1fa28cd3 at 17048 from network, going to process.
Dec 07 16:28:41.451 DEBG pipe: process_block 1fa28cd3 at 17048 with 0 inputs and 1 outputs.
Dec 07 16:28:41.451 DEBG pipe: validate_header cuckoo size 16
Dec 07 16:28:41.461 DEBG Starting new sumtree extension.

Also in my case the blocks prior to that were very large (lots of outputs) and were extending a fork. It looks like 17047 set a new head and 17048 came right after with that crash.

ignopeverell commented 6 years ago

Also before that fork that was going from 16172 to 17047, my MMR size was 37012, and as mentioned some of those block were fairly large. Seems that should have been enough to bring the MMR size at the same ballpark as Antioch, meaning that the underlying storage should have had enough data.

All of this to say that I'm now thinking it's an issue with the size the mmap is set at in AppendOnlyFile and not the size of the MMR or an issue with the root logic.

(mostly talking to myself to document this and not forget in case I have to stop for today)

ignopeverell commented 6 years ago

This has been bothering me for a while but I think I finally found the issue here:

https://github.com/mimblewimble/grin/blob/0a87c0ced1837051f49b7646c0d635bc3cbe836b/store/src/sumtree.rs#L402

If the buffer has a None (which would happen if an element has been removed within it, which would happen for a block that has an input spending its own output, or similarly in a group for forked blocks), a PMMR position will be completely skipped. And so the file becomes slightly smaller than it should.

I have to be careful with that fix and try it first on testnet1, as it could invalidate a huge chunk of the chain. But I don't think it will be the case. We make sure all cut-through have been applied when building a block, so there shouldn't be an input directly spending an output.

antiochp commented 6 years ago

From messing around with zero confirmation spends a few weeks ago - I'm 99% certain we will never persist a block where an input spends an output in the same block (and that cut-through in block.compact() eliminates this completely).