Closed psq closed 3 years ago
a related issue, which is probably partly related, is that the node is regularly falling behind getting the stacks headers. For example, just a few minutes ago, it had burn blocks 17868-17888, but not the headers, so it would mine off the wrong tip. Which probably explains other things you can see such a miner insisting on mining its own separate chain for 20+ blocks.
The overall fix for this may be quite simple:
As I was writing this, same thing happen, where the node had burn blocks 17942-17956, but missing the stacks headers. Will check if the logs can offer some clues (either not asking, or argon node not sending)
Wow, this is a really detailed report @psq! Thank you for providing so much information!
I think you're right -- combing through testnet/src/stacks-node/src
, it doesn't appear that the miner code checks on the completeness of the Stacks chainstate at all before it tries to mine a block. This is clearly suboptimal -- as you say, the miner should try to make sure all Stacks blocks have been processed before trying to mine.
The fix is a little less obvious, though. I think three things need to happen:
The burn DB (in src/chainstate/burn/db/burndb.rs
) should be updated so that it can report the highest-potentially-known Stacks block height. It can determine this by querying the canonical burn chain fork for the highest accepted block-commit transaction.
The relayer_run_tenure
method in neon_node.rs
needs to be updated with a heuristic that will cause it to avoid mining a block if the chainstate is too far behind. To do so, it will look for a discrepancy between the Stacks chain state's block height on the canonical Stacks fork, and the highest expected block height in the burn DB. If this difference is too big, then the node can infer that it doesn't yet have all blocks.
The p2p network thread should use the same kind of heuristic to determine if there's more blocks to fetch, so the miner won't fall behind.
Thoughts, @kantai @lgalabru?
(Taking notes from the weekly blockchain meeting).
Fixing this will fundamentally depend on a heuristic -- specifically, the rate at which blocks are downloaded. If the rate slows down to the expected block rate, then it's safe to mine.
At a minimum, the node should track and report its block download rate in /v2/info
, so even if mining is outsourced to an external program in the future, the external mining program can still know when it's safe to mine.
are you referring to downloaded burn blocks or or download stacks headers? I understand you'd need some heuristic for not having burn blocks, although you could also query a bitcoin node directly perhaps, but if you don't have stacks headers for the most recent burn blocks (except the latest, which is the one being built), your miner would certainly pick the wrong parent to mine off of, no? so no heuristic needed here.
are you referring to downloaded burn blocks or or download stacks headers?
Downloaded Stacks blocks (also, to clarify, the Stacks headers and blocks are downloaded at the same time).
We'll need the heuristic for Stacks blocks even if the entire burnchain state is available. This is because a sortition may not correspond to a real Stacks block in the first place. For example, the peer that mined it may have crashed intermittently and lost the block data. As another example, a griefing attacker could have just written garbage data to the block hash field in the Bitcoin block-commit transaction. Unfortunately, there's no way to tell the difference between a block's (possibly temporary) unavailability and a block's non-existence. So, the miner will need a heuristic to decide when it's good time to start mining, and a good indication of this is when the Stacks block download rate matches the expected Stacks block mining rate (i.e. the node is no longer bootstrapping).
The heuristic is necessarily an educated guess, and this one is pretty simple. But in the future, the miner code itself will be a separate program from the Stacks node, and will be able to query chain state information from both the Stacks node, the Bitcoin node, other 3rd party services, etc. in order to give the user as much of an informed opinion as possible as to when it's safe to try mining. At that point, it'll be out of our hands; all we have to do is make it so the node provides as much useful information to miner software as possible.
yes, that make sense. Thank you for the explanation! I re-read sip-001 last night, and noticed the mention about miner crashing. Currently, though, a miner can fall behind by 10-20 stacks block compared to the burn chain, which seems to be propagating fine. In that case, the miner should really stop mining.
More related questions, concerning sequence of events:
(ignoring micro blocks, which has no material bearing on this sequence, just makes it a bit more complex)
Stacks block in #2 is supposed to be produced as quickly as possible by winning miner to avoid coming after subsequent burn block, or is this fine? but maybe no miner will have sent a commit included in next burn block? Burn blocks can come quickly, not close to the average block time you'd expect from current mining difficulty.
Essentially, #1, #2, and #3 happen close together, and then miner waits for next burn block (and streams micro blocks if any)?
Is this somewhat close?
@psq can you verify if this is fixed by #1980 and close if so? Thanks!
looks like it is, will reopen if new evidence surfaces
This happened to a miner yesterday (1/23/2021) on mainnet
667452 1 667451 1065> 2,466.40 1,871,594 br71 s:779fd90958 p:829027050e c:346020aa35 i:842fa82a86 b:00000000000000000001b104f @+ [0.6]SP105ARDW7 [32.1]SP137NZXD0 [10.7]SP1RJH1641 [11.2]SP25Y6NQXB [1.1]SP2X3CCBFR [9.1]SP334VW9T9 [1.1]SP3PW87SQ8 [32.4]SPP1DJ1JE2*[1.8]SPY1PP2KZ6
667453 ? 0 - - - -
667454 310 667144 811 4,932.80 1,851,594 br73 s:4ad86a13db p:b1e8c3304a c:92a70e4fe6 i:7c6d6595e4 b:00000000000000000001e7d33 [0.6]SP105ARDW7 [32.4]SP137NZXD0 [10.8]SP1RJH1641 [11.3]SP25Y6NQXB*[9.2]SP334VW9T9 [1.1]SP3PW87SQ8 [32.7]SPP1DJ1JE2 [1.8]SPY1PP2KZ6
667455 286 667169 829 2,466.40 1,871,594 br74 s:ccba66e4e1 p:e554c8725f c:b45d29c76d i:d06a6c21e4 b:0000000000000000000858436 [0.6]SP105ARDW7 [32.1]SP137NZXD0 [10.7]SP1RJH1641 [11.2]SP25Y6NQXB*[1.1]SP2X3CCBFR [9.1]SP334VW9T9 [1.1]SP3PW87SQ8 [32.4]SPP1DJ1JE2 [1.8]SPY1PP2KZ6
667456 4 667452 1066> 2,466.40 2,081,594 br71 s:611603ed90 p:779fd90958 c:71ce135a73 i:28dfaff611 b:00000000000000000007ae5da [0.5]SP105ARDW7 [28.8]SP137NZXD0*[9.1]SP1D6NC90K [9.6]SP1RJH1641 [10.1]SP25Y6NQXB [1.0]SP2X3CCBFR [8.2]SP334VW9T9 [1.0]SP3PW87SQ8 [1.0]SP4F3EX8B7 [29.1]SPP1DJ1JE2 [1.6]SPY1PP2KZ6
667457 1 667456 1067> 2,466.40 2,081,594 br71 s:977f180ebc p:611603ed90 c:aded97047a i:6b00c5312c b:000000000000000000012fc9f @@ [0.5]SP105ARDW7 [28.8]SP137NZXD0 [9.1]SP1D6NC90K [9.6]SP1RJH1641 [10.1]SP25Y6NQXB [1.0]SP2X3CCBFR [8.2]SP334VW9T9 [1.0]SP3PW87SQ8 [1.0]SP4F3EX8B7 [29.1]SPP1DJ1JE2 [1.6]SPY1PP2KZ6*
667458 1 667457 1068> 2,466.40 2,081,594 br71 s:3ce14d0bf7 p:977f180ebc c:4f20c0adcb i:0cb04389db b:00000000000000000003d5334 @@ [0.5]SP105ARDW7 [28.8]SP137NZXD0 [9.1]SP1D6NC90K [9.6]SP1RJH1641 [10.1]SP25Y6NQXB*[1.0]SP2X3CCBFR [8.2]SP334VW9T9 [1.0]SP3PW87SQ8 [1.0]SP4F3EX8B7 [29.1]SPP1DJ1JE2 [1.6]SPY1PP2KZ6
You can can see miner SP25Y6NQXB
restarting with mining block 811 (tip is 1065), then block 829, and then being caught up while mining 1068 a few blocks later
Looks like #2364 is a duplicate of this issue. Either way, #2370 should address it by allowing miners to explicitly specify a mine-at-block
height as a command-line parameter.
In 2.0.3 and later, you can specify a minimum block height at which the miner will try to mine, which should prevent the node from mining too early.
This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.
Describe the bug When starting a miner for the first time, the miner will first start downloading the burn block information, and will start mining when this is done. In most cases, the stacks headers are not all downloaded, so the miner will start mining a block off of its longest fork, which is not a block that will be accepted into the longest fork by other miners.
For example, during last week, my miner
ST2Z840ZWSF54AFGB1QAEVJ8S8ME7H5BP81C6HJ19
will start mining at burn block 1642 (line 1644 of attached week27.txt file, sorry, this was too big for a gist, which could have been linked into directly). First sortition won at 1643 (the*
after the miner's address is the sortition winner for that block), and will mine a new stacks block 1 (current chain height is 899 (see previous block)There are then a few won blocks where the data is incomplete (which should be filed as a separate problem once I understand a bit better what's going on, but more rejected blocks)
Next block won with data is at burn block 1676 (line 1678), and is stacks block 226 (chain height is 914 from previous line), then 227, then 242...
My miner won't be caught up till much later (not sure how many bad blocks were mined, but 100s, overall, my miner mined 1307 blocks that were rejected)
Miner gets almost caught up around burn block 6353 (almost 5000 burn blocks later), mining 3201 a few block after one was already mined at 6350, and keeps mining its on fork (br816, while the highest is on br799, mined by
ST2TJRHDHMYBQ417HFB0BDX430TQA5PXRX6495G1V
, which is most likely the argon node), it will mine on that fork for some time, and eventually at burn block 6435 it will mine a block in br799, and first actual reward... ouchOther miners will show the same behavior, see line 11332, where 5 miners mined block 1, I just focused on mine as I had more data available.
There are a few other issues I mentioned in discord I will file separately from that same run, or this weeks run.
Expected behavior Miner should only start mining when a block has a chance to be accepted in the current best fork.
the data used for the analysis was extracted from 3 of the sqlite databases (burnchain/db/bitcoin/regtest/burn.db/data.db, chainstate/chain-00000080-testnet/vm/headers.db, chainstate/chain-00000080-testnet/blocks/staging.db)
the burn blocks list includes the following
*
$
if that's part of current longest forkweek27.txt
There is probably more that is needed to understand the output, so fire questions away, or perhaps we could discuss live if it makes sense.