Closed whoabuddy closed 4 years ago
Thanks for all the information @whoabuddy! It does look like the same underlying issue. Do you happen to know what the last git commit on master
was when you built your node? Because, #1774 should have fixed this issue (I'd be very curious to know if this problem occurred after #1774 was merged).
Happy to help! I can check when I'm at the office again, but it may not be until Monday. I haven't done anything else with the VM since uploading the info here in case we needed more information.
Given that PR was merged on 7/28, and I deleted the stacks-blockchain directory then cloned the repo fresh on 8/6, my guess is it should have included that commit from master
.
@jcnelson got the same crash twice right after startup today (after burnchain download completes). Definitely have d8d4043eb046b0e9f57afd337044646fa25dc231. Can't load the snapshot for either the last known block, and the one before that (the 2 crashes look slightly different for what happened before)
Will see if I can reproduce with possibly more logs.
Thanks for getting back to me on this @psq! Were these nodes booted up from an empty chainstate, or an existing chainstate? The node currently doesn't make any guarantees about the consistency of the data of an existing chainstate directory -- there isn't yet a graceful shutdown feature, so it's possible to see a corruption error on restarting with an existing chainstate directory (especially one that could arise from the MARF recording that a snapshot exists, but the snapshot DB not returning it).
empty chain state. I think it is consistent with some I saw last week. It is like there is a point during boot up where this is quite likely to happen, but if you go past that point, you are fine, even though it could still happen as it happened to @whoabuddy (log looks very similar, so I'm assuming that's the same issue). I just got a successful startup, so I'll try to compare the 2. I was running a few changes from master, but mostly extra logs, and extra configuration code (but same default values), so that should not matter here.
@kantai when you wrote the comment on this line https://github.com/blockstack/stacks-blockchain/blob/afc748aa1615ff966675a828f02efd78e092c2d8/src/chainstate/burn/db/sortdb.rs#L1241 why were you saying this shouldn't be possible?
Seems to be happening more than you'd wish, and then panicking at https://github.com/blockstack/stacks-blockchain/blob/master/testnet/stacks-node/src/neon_node.rs#L289 because of the error returned.
Any possible cause for the snapshot not being there when it should?
Technically, I wrote that line, and @kantai later modified it.
That branch shouldn't be reachable if everything is working right (and clearly, it is not) because the stable block height is always less than or equal to the chain height, meaning that there must always be a corresponding snapshot for it. @kantai might the fact that this code is being reached be due to the off-by-one error you were describing earlier, in PR #1787?
I don't think that off-by-one would cause this -- I'm pretty sure that that bug isn't present in master in any case.
My initial suspicion here is that get_block_snapshot_by_height
is expected an "effective" block height (i.e., block-height - first-block-height) or a similar such error.
Technically, I wrote that line, and @kantai later modified it.
ah, sorry! I went with what git blame
was giving me...
Can't speak for what is happening with @whoabuddy, although without the timing changes, the node can get behind in getting snapshots processed, but in my 2 cases, my node only had processed about 1/3 of the snapshots (at least to a consistent height where it has all the prior snapshots), so retrieving the snapshot for the most recent burn block is going to fail (I'm still a bit unclear on how the different storage areas relate to each other, and in what order it is supposed to happen when everything goes fine). May be it is related to starting to mine too early (https://github.com/blockstack/stacks-blockchain/issues/1745) as the state has not reached a fully consistent shape?
I'll look into how the stable block height
is calculated, maybe it is too eager when the node is starting up? Or needs to catch up later on (although if that's really the reason, I would expect the panic to happen a lot more often).
Also, maybe using panic
here is too drastic, as the database is not complete yet, and it may appear to be corrupted, but I suspect this is only temporary.
I have been following everything here but have not gone into the other office yet. I've been in sort of a "transition phase" between locations but definitely plan to get physical access to that computer and its virtual miners today. From there I can check the git log and do some further testing, as well as restore my remote access.
In the meantime, I modified my shell script with a "debug mode" that:
script
I used that version of the shell script yesterday with a virtual miner on this laptop with the same configuration (Virtualbox, Ubuntu Server, 1gb RAM) and it's been running for ~14 hours with no issues now. :man_shrugging: I will leave it running and report back if it panics the same way, but so far, so good.
I wish I could read/interpret the logs or understand the operations here as well as everyone else in the thread, but I'm doing my best to keep up and if there is any additional information I can grab that may be helpful, let me know!
Update: @jcnelson I finally got in front of that miner today, the git log shows this at the head:
commit afc748aa1615ff966675a828f02efd78e092c2d8 Merge: 2f7611c8e e86cc9571 Author: Aaron Blankstein aaron@blockstack.com Date: Sun Aug 2 11:10:55 2020 -0500 Merge pull request #1782 from blockstack/bugtemplate-patch-1 Update testnet-bug.md
and about 8 commits back:
commit d8d4043eb046b0e9f57afd337044646fa25dc231 Merge: e147ec3ed 72deb87f1 Author: Jude Nelson judecn@gmail.com Date: Mon Jul 27 20:28:40 2020 +0000 Merge pull request #1774 from blockstack/fix/marf-1557 Fix/marf 1557
I can upload the log if needed. I want to also note that the node running on my laptop is still running without issue, so I was not able to replicate the issue on a different VM.
I am going to start the other 3 miner VMs on this machine with the debug settings enabled, so if they do panic I'll have a more detailed log to share (and it will definitely be the most updated code/a fresh directory).
Is there any additional information I can pull from this VM that panicked that would be helpful? For now I will leave it be just in case, otherwise I will start it back up fresh with debug enabled as well.
@whoabuddy the log and the content of the data folder would be helpful. I have 2 cases I have been looking at locally, but these happen during the initial sync, not later, so yours might have something different.
@psq I'm back in front of this machine today, and excuse my ignorance, I am not sure what files/paths you need in particular.
Back in my first post I included a copy of the terminal output (without debug), and I did my best to find the folders on the VM but still wasn't sure. I do see a number of files in stacks-blockchain/target/debug
- maybe that's the one you are looking for?
I am happy to preserve this machine as-is and provide the information once I better understand.
On a side note - happy to report my laptop and the other 3 nodes on this machine are running normally so far, so not sure what was going on with this one, but always happy to dig in and learn more!
the data folder is in /tmp
by default, so look there. If you were not running anything else, most likely the folder with the most recent timestamp should be the one we need.
If you look in your log, you should see the path it is using repeated quite often. For example, mine shows:
./.stack/burnchain/db/bitcoin/regtest/sortition.db/data.db
but that's because I've set
working_dir = "./.stack"
in my configuration file.
Yours should say something like
/tmp/?????/burnchain/db/bitcoin/regtest/sortition.db/data.db
The top level of that data directory should have
burnchain
chainstate
peer_db.sqlite
And the other thing is the full log (either from the file if you redirected the output, or by copying the whole content of the console.
@psq Thank you! Using the node on my laptop I can see the files in /tmp
, so that's what I needed! I am not in front of that desktop to grab the data directory, but this is a copy of the log (linked in the first post here as well). I will work on getting the data directory from that VM next time I'm in front of it.
The node running on my laptop panicked this morning, and based on the script recording, it ran from 2020-08-11 17:35:32+00:00
through 2020-08-17 13:00:49+00:00
.
curl -sS -O https://raw.githubusercontent.com/AbsorbingChaos/bks-setup-miner/master/config-miner-argon.sh
bash config-miner-argon.sh debug
Debug mode is described in the readme if more info is needed on the script.
OS: Ubuntu Server 20.04 Rust version: rustc 1.44.1
Testnet started with BLOCKSTACK_DEBUG=1 RUST_BACKTRACE=full cargo testnet start
:
https://github.com/AbsorbingChaos/bks-setup-miner/blob/master/config-miner-argon.sh#L172
DEBUG [1597669248.582] [src/burnchains/bitcoin/indexer.rs:458] [ThreadId(1)] Sync all headers starting at block 22244
DEBUG [1597669248.593] [src/burnchains/bitcoin/spv.rs:394] [ThreadId(1)] Initialized block headers at /tmp/stacks-testnet-c916a61c99f04694/burnchain/spv-headers.dat
DEBUG [1597669248.593] [src/burnchains/bitcoin/indexer.rs:373] [ThreadId(1)] (Re)establish peer connection
DEBUG [1597669248.669] [src/burnchains/bitcoin/network.rs:164] [ThreadId(1)] Begin peer handshake to 35.236.218.197:18444
DEBUG [1597669248.672] [src/burnchains/bitcoin/network.rs:251] [ThreadId(1)] Send version (nonce=12309017456111387569) to 35.236.218.197:18444
DEBUG [1597669248.834] [src/burnchains/bitcoin/network.rs:259] [ThreadId(1)] Handle version -- remote peer blockchain height is 22244
DEBUG [1597669248.836] [src/burnchains/bitcoin/network.rs:274] [ThreadId(1)] Send verack
DEBUG [1597669248.838] [src/burnchains/bitcoin/network.rs:283] [ThreadId(1)] Handle verack
DEBUG [1597669248.839] [src/burnchains/bitcoin/network.rs:172] [ThreadId(1)] Established connection to 35.236.218.197:18444, who has 22244 blocks
DEBUG [1597669248.839] [src/burnchains/bitcoin/spv.rs:656] [ThreadId(1)] Have all headers up to 22244
DEBUG [1597669248.839] [src/burnchains/bitcoin/indexer.rs:394] [ThreadId(1)] begin_session() terminates conversation
DEBUG [1597669248.839] [src/burnchains/burnchain.rs:698] [ThreadId(1)] Sync'ed headers from 22244 to 22244. DB at 22244
DEBUG [1597669248.841] [testnet/stacks-node/src/burnchains/bitcoin_regtest_controller.rs:202] [ThreadId(1)] Done receiving blocks
DEBUG [1597669249.382] [testnet/stacks-node/src/burnchains/bitcoin_regtest_controller.rs:792] [ThreadId(6)] BitcoinRPC builder: None:None@http://35.236.218.197:18443/
0: stacks_node::main::{{closure}}
at testnet/stacks-node/src/main.rs:44
1: std::panicking::rust_panic_with_hook
at /rustc/c7087fe00d2ba919df1d813c040a5d47e43b0fe7/src/libstd/panicking.rs:515
2: std::panicking::begin_panic
at /rustc/c7087fe00d2ba919df1d813c040a5d47e43b0fe7/src/libstd/panicking.rs:438
3: stacks_node::neon_node::spawn_peer::{{closure}}
at testnet/stacks-node/src/neon_node.rs:289
4: std::sys_common::backtrace::__rust_begin_short_backtrace
at /rustc/c7087fe00d2ba919df1d813c040a5d47e43b0fe7/src/libstd/sys_common/backtrace.rs:130
5: std::thread::Builder::spawn_unchecked::{{closure}}::{{closure}}
at /rustc/c7087fe00d2ba919df1d813c040a5d47e43b0fe7/src/libstd/thread/mod.rs:475
6: <std::panic::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once
at /rustc/c7087fe00d2ba919df1d813c040a5d47e43b0fe7/src/libstd/panic.rs:318
7: std::panicking::try::do_call
at /rustc/c7087fe00d2ba919df1d813c040a5d47e43b0fe7/src/libstd/panicking.rs:331
8: __rust_try
9: std::panicking::try
at /rustc/c7087fe00d2ba919df1d813c040a5d47e43b0fe7/src/libstd/panicking.rs:274
10: std::panic::catch_unwind
at /rustc/c7087fe00d2ba919df1d813c040a5d47e43b0fe7/src/libstd/panic.rs:394
11: std::thread::Builder::spawn_unchecked::{{closure}}
at /rustc/c7087fe00d2ba919df1d813c040a5d47e43b0fe7/src/libstd/thread/mod.rs:474
12: core::ops::function::FnOnce::call_once{{vtable.shim}}
at /rustc/c7087fe00d2ba919df1d813c040a5d47e43b0fe7/src/libcore/ops/function.rs:232
13: <alloc::boxed::Box<F> as core::ops::function::FnOnce<A>>::call_once
at /rustc/c7087fe00d2ba919df1d813c040a5d47e43b0fe7/src/liballoc/boxed.rs:1008
<alloc::boxed::Box<F> as core::ops::function::FnOnce<A>>::call_once
at /rustc/c7087fe00d2ba919df1d813c040a5d47e43b0fe7/src/liballoc/boxed.rs:1008
std::sys::unix::thread::Thread::new::thread_start
at /rustc/c7087fe00d2ba919df1d813c040a5d47e43b0fe7/src/libstd/sys/unix/thread.rs:87
14: start_thread
15: clone
git log
afc748aa1 Merge pull request #1782 from blockstack/bugtemplate-patch-1 e86cc9571 Update testnet-bug.md 2f7611c8e Update testnet-bug.md 68a5396e3 Merge pull request #1775 from blockstack/fix/net-1750 2e38bade1 remove obsolete debug output 411c7af98 add test for get_parent_block_header ee799727b remove unnecessary argument 0bab320f0 fix #1750 -- the problem was that we weren't using the right anchor block hash to load a block's header d8d4043eb Merge pull request #1774 from blockstack/fix/marf-1557 e147ec3ed Merge pull request #1769 from blockstack/feat/bitcoin-gh-tests
Since this node ran for almost 6 days, the log file is 5gb
uncompressed, 514mb
compressed. :eyes:
https://envl.app/absorbingchaos/SsH3pE!g6TTPpMOgrI7JBEF
This one was quite large as well, 557mb
compressed.
Thank you for the submission @whoabuddy 👍
@jcnelson @psq @whoabuddy
Quick question just to clarify, this bug looks very similar to #1781 , can I label this bug as a duplicate?
@timstackblock the crashes happen in different threads, so they are probably not directly related. The root cause may be the same, but not possible to tell yet.
Thanks for the info @psq, I will watch the issue and follow up once there has been more traction.
@timstackblock Actually I run into this issue as well. It happens once my node syncs and works for a while.
I've been launching a node with sidecar API attached.
In order to run that, I've been using the following docker images:
1) For stacks node - blockstack/stacks-blockchain:latest
2) For API- blockstack/stacks-blockchain-api:latest
3) For DB - postgres:12.2
Bellow is my list of environment variables: Postgres environment variables:
POSTGRES_DB: stacks_blockchain_api
POSTGRES_USER: postgres
POSTGRES_PASSWORD: password
POSTGRES_LOGGING: true
POSTGRES_PORT: 5490
Stacks Node environment variables :
STACKS_EVENT_OBSERVER: 127.0.0.1:3700
XBLOCKSTACK_DEBUG: 1
RUST_BACKTRACE: 1
API environment variables:
POSTGRES_HOST: 127.0.0.1
POSTGRES_PORT: 5490
PG_DATABASE: stacks_blockchain_api
POSTGRES_USER: postgres
POSTGRES_PASSWORD: password
STACKS_CORE_EVENT_PORT: 3700
STACKS_CORE_EVENT_HOST: http://0.0.0.0
STACKS_BLOCKCHAIN_API_PORT: 3999
STACKS_BLOCKCHAIN_API_HOST: 0.0.0.0
STACKS_BLOCKCHAIN_API_DB: pg
STACKS_CORE_RPC_HOST: 127.0.0.1
STACKS_CORE_RPC_PORT: 20443
NODE_ENV: development
Here are the last 50 strings of logs, without DEBUG enabled:
INFO [1597906202.216] [src/burnchains/burnchain.rs:704] [ThreadId(1)] Node will fetch burnchain blocks 784-785...
INFO [1597906202.855] [testnet/stacks-node/src/neon_node.rs:882] [ThreadId(1)] Received burnchain block #785 including block_commit_op (winning) - mx2uds6sgnn9znABQ6iDSSmXY9K5D4SHF9
INFO [1597906204.193] [src/net/relay.rs:800] [ThreadId(5)] Processing newly received blocks: 1
INFO [1597906234.238] [src/net/p2p.rs:2098] [ThreadId(6)] Failed to query public IP (RecvTimeout; skipping
INFO [1597906323.790] [src/burnchains/burnchain.rs:704] [ThreadId(1)] Node will fetch burnchain blocks 785-786...
INFO [1597906324.430] [testnet/stacks-node/src/neon_node.rs:882] [ThreadId(1)] Received burnchain block #786 including block_commit_op (winning) - mx2uds6sgnn9znABQ6iDSSmXY9K5D4SHF9
INFO [1597906325.434] [src/net/relay.rs:800] [ThreadId(5)] Processing newly received blocks: 1
WARN [1597906364.306] [src/net/p2p.rs:943] [ThreadId(6)] Failed to get peer address of TcpStream { addr: V4(192.168.186.228:45644), fd: 37 }: Os { code: 107, kind: NotConnected, message: "Socket not connected" }
INFO [1597906364.306] [src/net/p2p.rs:1753] [ThreadId(6)] local.80000000://(bind=V4(0.0.0.0:20444))(pub=None): Failed to connect to myself for IP confirmation
INFO [1597906446.394] [src/burnchains/burnchain.rs:704] [ThreadId(1)] Node will fetch burnchain blocks 786-787...
INFO [1597906447.013] [testnet/stacks-node/src/neon_node.rs:882] [ThreadId(1)] Received burnchain block #787 including block_commit_op (winning) - mo8CosqCssC3jgdQZEiHqbE5m1kHTGoGCX
WARN [1597906495.378] [src/net/p2p.rs:943] [ThreadId(6)] Failed to get peer address of TcpStream { addr: V4(192.168.186.228:46844), fd: 37 }: Os { code: 107, kind: NotConnected, message: "Socket not connected" }
INFO [1597906495.378] [src/net/p2p.rs:1753] [ThreadId(6)] local.80000000://(bind=V4(0.0.0.0:20444))(pub=None): Failed to connect to myself for IP confirmation
INFO [1597906495.795] [src/net/relay.rs:800] [ThreadId(5)] Processing newly received blocks: 1
INFO [1597906567.208] [src/burnchains/burnchain.rs:704] [ThreadId(1)] Node will fetch burnchain blocks 787-788...
ERROR [1597906567.901] [src/chainstate/burn/db/sortdb.rs:1242] [ThreadId(6)] Failed to load snapshot for block 787 from fork 148faf4306a778442ad9e51c94e2b72dec3d05afc24b6a94bcddc311f1d2fa4b
ERROR [1597906567.901] [testnet/stacks-node/src/neon_node.rs:288] [ThreadId(6)] P2P: Failed to process network dispatch: DBError(Corruption)
Process abort due to thread panic
INFO [1597906567.919] [testnet/stacks-node/src/neon_node.rs:882] [ThreadId(1)] Received burnchain block #788 including block_commit_op (winning) - mx2uds6sgnn9znABQ6iDSSmXY9K5D4SHF9
0: stacks_node::main::{{closure}}
at src/testnet/stacks-node/src/main.rs:44
1: std::panicking::rust_panic_with_hook
at rustc/d3fb005a39e62501b8b0b356166e515ae24e2e54/src/libstd/panicking.rs:490
2: std::panicking::begin_panic
at rustc/d3fb005a39e62501b8b0b356166e515ae24e2e54/src/libstd/panicking.rs:410
3: stacks_node::neon_node::spawn_peer::{{closure}}
at src/testnet/stacks-node/src/neon_node.rs:289
std::sys_common::backtrace::__rust_begin_short_backtrace
at rustc/d3fb005a39e62501b8b0b356166e515ae24e2e54/src/libstd/sys_common/backtrace.rs:130
4: std::thread::Builder::spawn_unchecked::{{closure}}::{{closure}}
at rustc/d3fb005a39e62501b8b0b356166e515ae24e2e54/src/libstd/thread/mod.rs:475
<std::panic::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once
at rustc/d3fb005a39e62501b8b0b356166e515ae24e2e54/src/libstd/panic.rs:318
std::panicking::try::do_call
at rustc/d3fb005a39e62501b8b0b356166e515ae24e2e54/src/libstd/panicking.rs:297
std::panicking::try
at rustc/d3fb005a39e62501b8b0b356166e515ae24e2e54/src/libstd/panicking.rs:274
std::panic::catch_unwind
at rustc/d3fb005a39e62501b8b0b356166e515ae24e2e54/src/libstd/panic.rs:394
std::thread::Builder::spawn_unchecked::{{closure}}
at rustc/d3fb005a39e62501b8b0b356166e515ae24e2e54/src/libstd/thread/mod.rs:474
core::ops::function::FnOnce::call_once{{vtable.shim}}
at rustc/d3fb005a39e62501b8b0b356166e515ae24e2e54/src/libcore/ops/function.rs:232
5: <alloc::boxed::Box<F> as core::ops::function::FnOnce<A>>::call_once
at rustc/d3fb005a39e62501b8b0b356166e515ae24e2e54/src/liballoc/boxed.rs:1076
<alloc::boxed::Box<F> as core::ops::function::FnOnce<A>>::call_once
at rustc/d3fb005a39e62501b8b0b356166e515ae24e2e54/src/liballoc/boxed.rs:1076
std::sys::unix::thread::Thread::new::thread_start
at rustc/d3fb005a39e62501b8b0b356166e515ae24e2e54/src/libstd/sys/unix/thread.rs:87
While links for logs can be found under these links: 1) Without DEBUG: https://drive.google.com/file/d/1aZLocBxhM_6zLzHNwYM3a0j1f-PTVf7Q/view?usp=sharing 2) With DEBUG: https://drive.google.com/file/d/14QaMRJMN2x3r1ocgJTuqvtdLYQPOVCIL/view?usp=sharing
Thanks for the info @danielimkk
thank you for the logs, @danielimkk, nice that this one happened very quickly. Is this something you built yourself? if so what's the result of git log -1 --format=format:"%H"
?
I think I have a good idea of what's going on here, but want to try and recreate the issue a few more times before concluding that this is actually the issue. Our "MARF-indexed database" transactions maintain two SQLite connections to two separate databases. This in turn means that they do not commit atomically -- this is a bug no matter how we slice it. The question is whether or not that's causing the issue here. When I started argon miners locally, occasionally, I'd see the reported behavior, caused by:
DEBUG [1598373620.643] [src/chainstate/stacks/index/trie.rs:745] [ThreadId(28)] Next root hash is f8260157ad4125061a843f265703390ab0e7fd4907358da65aba6c9b7f06b5ac (update_skiplist=false)
DEBUG [1598373620.643] [src/chainstate/stacks/index/marf.rs:470] [ThreadId(28)] MARF Insert in 3d3ec5c29dbea25072d11b2acb28392fddb7266a24a924a82363e3352414f884: '4aa445e2891853300e4901d67c6d3b850a6d4e89f9ade93e196aa35d7e888202' = '5a1883b440688c90e719f9b34c9b39a3ed60843edc2a1a1085c5dc6e7d8daddf0000000000000000' (...[])
DEBUG [1598373620.646] [src/chainstate/stacks/index/trie.rs:745] [ThreadId(28)] Next root hash is bf10d5c54abd0b9de46cc669057669a4f316d01795431b401f440cd96fdc69bf (update_skiplist=true)
DEBUG [1598373620.647] [src/chainstate/burn/db/sortdb.rs:1714] [ThreadId(28)] ACCEPTED(4679) leader key register d3c3bed111d6bb40e686d9db2ea0162ae08c8eadb3354c422d4cf6b7fce4df0d at 4679,1
DEBUG [1598373620.647] [src/chainstate/burn/db/sortdb.rs:1714] [ThreadId(28)] ACCEPTED(4679) leader key register 32dcd0afa9a57dabb9bf9eea355106afc2fa734d5623b7fc3e40b312db12bb68 at 4679,2
DEBUG [1598373620.674] [src/chainstate/burn/db/sortdb.rs:1718] [ThreadId(28)] ACCEPTED(4679) leader block commit ab1b32f738184443f6953e3c92a4cf1fa8c5c0e2fd374e306d9f1fd36467bb41 at 4679,3
DEBUG [1598373620.674] [src/chainstate/burn/db/sortdb.rs:1714] [ThreadId(28)] ACCEPTED(4679) leader key register 4b28296a759c08418eb7a4023bfa35d833ffce4c2ad573437510dc3f7b366a96 at 4679,4
DEBUG [1598373620.674] [src/chainstate/burn/db/sortdb.rs:1718] [ThreadId(28)] ACCEPTED(4679) leader block commit d30e45e0985be30922fd92be39ff2bf11234977b4e9991dc410717fdb890079a at 4679,5
DEBUG [1598373620.674] [src/chainstate/burn/db/processing.rs:133] [ThreadId(28)] OPS-HASH(4679): fc988223993ecc56d35368c5c427925eb6388e4511dd18b94dfcf97cfbad03fa
DEBUG [1598373620.674] [src/chainstate/burn/db/processing.rs:134] [ThreadId(28)] INDEX-ROOT(4679): bf10d5c54abd0b9de46cc669057669a4f316d01795431b401f440cd96fdc69bf
DEBUG [1598373620.674] [src/chainstate/burn/db/processing.rs:135] [ThreadId(28)] SORTITION-HASH(4679): 0b55822d9a908a8619c483a0e9b77e13bd591521c8f83a7bb044db6aa13476af
DEBUG [1598373620.674] [src/chainstate/burn/db/processing.rs:136] [ThreadId(28)] CONSENSUS(4679): fef2982ea1d04e4d18e780699c321deee8e8ac62
DEBUG [1598373620.706] [src/net/p2p.rs:2581] [ThreadId(6)] local.80000000://(bind=V4(0.0.0.0:20444))(pub=None): load chain view for burn block 4679
DEBUG [1598373620.707] [src/chainstate/burn/db/sortdb.rs:1533] [ThreadId(6)] No ancestor block 4678 from 3d3ec5c29dbea25072d11b2acb28392fddb7266a24a924a82363e3352414f884 in index
ERROR [1598373620.707] [src/chainstate/burn/db/sortdb.rs:1242] [ThreadId(6)] Failed to load snapshot for block 4678 from fork 3d3ec5c29dbea25072d11b2acb28392fddb7266a24a924a82363e3352414f884
ERROR [1598373620.707] [testnet/stacks-node/src/neon_node.rs:288] [ThreadId(6)] P2P: Failed to process network dispatch: DBError(Corruption)
Process abort due to thread panic
DEBUG [1598373620.712] [src/chainstate/stacks/index/storage.rs:1287] [ThreadId(28)] Flush: 3d3ec5c29dbea25072d11b2acb28392fddb7266a24a924a82363e3352414f884 to self
DEBUG [1598373620.717] [src/chainstate/stacks/index/storage.rs:1333] [ThreadId(28)] Flush: identifier of self is 4680
DEBUG [1598373620.718] [src/burnchains/burnchain.rs:768] [ThreadId(28)] Inserted block 4679 in 6245ms
DEBUG [1598373620.727] [testnet/stacks-node/src/burnchains/bitcoin_regtest_controller.rs:202] [ThreadId(1)] Done receiving blocks
Of particular note:
DEBUG [1598373620.706] [src/net/p2p.rs:2581] [ThreadId(6)] local.80000000://(bind=V4(0.0.0.0:20444))(pub=None): load chain view for burn block 4679
DEBUG [1598373620.707] [src/chainstate/burn/db/sortdb.rs:1533] [ThreadId(6)] No ancestor block 4678 from 3d3ec5c29dbea25072d11b2acb28392fddb7266a24a924a82363e3352414f884 in index
Occurs before this line:
DEBUG [1598373620.718] [src/burnchains/burnchain.rs:768] [ThreadId(28)] Inserted block 4679 in 6245ms
Which means that thread 6 is reading block 4679 as the burn chain tip before it finished committing in thead 28!
When I diagnosed the database after the crash, block 4679's MARF index was able to find its immediate ancestor (4678), which indicates that there isn't a corruption happening, but rather a synchrony issue.
To test that this is indeed creating this problem, I added a delay between the two commits to see if this bug is readily reproduced.
To deal with this, I think the MARF index and the corresponding data store must be in the same SQLite database.
Not sure if this is the same issue, but here is my log file. Miner stops after a while. Last lines looks the same as @whoabuddy
Here are some more logs last.log
Fixed in #1876
Wondering if this is related to #1781 at all?
Describe the bug
I ran the argon miner via this script, which I now realize did not have debug enabled. I can run the process again with BLOCKSTACK_DEBUG=1 and RUST_BACKTRACE=full and report a second log if needed, but for now, I am preserving the directory at @jcnelson's request.
Steps To Reproduce
rm -rf ./stacks-blockchain
script 20200805-stacks-blockchain
<-- handy way to record terminal output!curl -o- https://raw.githubusercontent.com/AbsorbingChaos/bks-setup-miner/master/config-miner-argon.sh | bash
Expected behavior
To still have a running testnet miner :slightly_smiling_face:
Environment (please complete the following information):
git clone
https://github.com/AbsorbingChaos/bks-setup-miner/blob/38b29b713d073073278093fb57869f6f2bfda701/config-miner-argon.sh#L81Additional context
Testnet started with
cargo testnet start
: https://github.com/AbsorbingChaos/bks-setup-miner/blob/38b29b713d073073278093fb57869f6f2bfda701/config-miner-argon.sh#L136Attached is a text file of all miner output from start of script through panic.
20200805-stacks-blockchain-panic.txt
Last bunch of lines from the text file, since it includes the build and the miner running for a while, it's quite long!