psq commented 4 years ago

Describe the bug When starting a miner for the first time, the miner will first start downloading the burn block information, and will start mining when this is done. In most cases, the stacks headers are not all downloaded, so the miner will start mining a block off of its longest fork, which is not a block that will be accepted into the longest fork by other miners.

For example, during last week, my miner ST2Z840ZWSF54AFGB1QAEVJ8S8ME7H5BP81C6HJ19 will start mining at burn block 1642 (line 1644 of attached week27.txt file, sorry, this was too big for a gist, which could have been linked into directly). First sortition won at 1643 (the * after the miner's address is the sortition winner for that block), and will mine a new stacks block 1 (current chain height is 899 (see previous block)

There are then a few won blocks where the data is incomplete (which should be filed as a separate problem once I understand a bit better what's going on, but more rejected blocks)

Next block won with data is at burn block 1676 (line 1678), and is stacks block 226 (chain height is 914 from previous line), then 227, then 242...

My miner won't be caught up till much later (not sure how many bad blocks were mined, but 100s, overall, my miner mined 1307 blocks that were rejected)

Miner gets almost caught up around burn block 6353 (almost 5000 burn blocks later), mining 3201 a few block after one was already mined at 6350, and keeps mining its on fork (br816, while the highest is on br799, mined by ST2TJRHDHMYBQ417HFB0BDX430TQA5PXRX6495G1V, which is most likely the argon node), it will mine on that fork for some time, and eventually at burn block 6435 it will mine a block in br799, and first actual reward... ouch

Other miners will show the same behavior, see line 11332, where 5 miners mined block 1, I just focused on mine as I had more data available.

There are a few other issues I mentioned in discord I will file separately from that same run, or this weeks run.

Expected behavior Miner should only start mining when a block has a chance to be accepted in the current best fork.

the data used for the analysis was extracted from 3 of the sqlite databases (burnchain/db/bitcoin/regtest/burn.db/data.db, chainstate/chain-00000080-testnet/vm/headers.db, chainstate/chain-00000080-testnet/blocks/staging.db)

the burn blocks list includes the following

burn block height, list of miners, winner with *
stacks_block_height
stacks_block_height from payments, '>' represents new highest block mined
block_height from headers
branch name, with a $ if that's part of current longest fork
block hash
parent block hash

week27.txt

There is probably more that is needed to understand the output, so fire questions away, or perhaps we could discuss live if it makes sense.

psq commented 4 years ago

a related issue, which is probably partly related, is that the node is regularly falling behind getting the stacks headers. For example, just a few minutes ago, it had burn blocks 17868-17888, but not the headers, so it would mine off the wrong tip. Which probably explains other things you can see such a miner insisting on mining its own separate chain for 20+ blocks.

The overall fix for this may be quite simple:

stop mining if headers are not caught up with burn blocks
improve the speed at which headers are propagated to peers (not always the case)

As I was writing this, same thing happen, where the node had burn blocks 17942-17956, but missing the stacks headers. Will check if the logs can offer some clues (either not asking, or argon node not sending)

jcnelson commented 4 years ago

Wow, this is a really detailed report @psq! Thank you for providing so much information!

I think you're right -- combing through testnet/src/stacks-node/src, it doesn't appear that the miner code checks on the completeness of the Stacks chainstate at all before it tries to mine a block. This is clearly suboptimal -- as you say, the miner should try to make sure all Stacks blocks have been processed before trying to mine.

The fix is a little less obvious, though. I think three things need to happen:

The burn DB (in src/chainstate/burn/db/burndb.rs) should be updated so that it can report the highest-potentially-known Stacks block height. It can determine this by querying the canonical burn chain fork for the highest accepted block-commit transaction.
The relayer_run_tenure method in neon_node.rs needs to be updated with a heuristic that will cause it to avoid mining a block if the chainstate is too far behind. To do so, it will look for a discrepancy between the Stacks chain state's block height on the canonical Stacks fork, and the highest expected block height in the burn DB. If this difference is too big, then the node can infer that it doesn't yet have all blocks.
The p2p network thread should use the same kind of heuristic to determine if there's more blocks to fetch, so the miner won't fall behind.

Thoughts, @kantai @lgalabru?

jcnelson commented 4 years ago

(Taking notes from the weekly blockchain meeting).

Fixing this will fundamentally depend on a heuristic -- specifically, the rate at which blocks are downloaded. If the rate slows down to the expected block rate, then it's safe to mine.

At a minimum, the node should track and report its block download rate in /v2/info, so even if mining is outsourced to an external program in the future, the external mining program can still know when it's safe to mine.

psq commented 4 years ago

are you referring to downloaded burn blocks or or download stacks headers? I understand you'd need some heuristic for not having burn blocks, although you could also query a bitcoin node directly perhaps, but if you don't have stacks headers for the most recent burn blocks (except the latest, which is the one being built), your miner would certainly pick the wrong parent to mine off of, no? so no heuristic needed here.

jcnelson commented 4 years ago

are you referring to downloaded burn blocks or or download stacks headers?

Downloaded Stacks blocks (also, to clarify, the Stacks headers and blocks are downloaded at the same time).

We'll need the heuristic for Stacks blocks even if the entire burnchain state is available. This is because a sortition may not correspond to a real Stacks block in the first place. For example, the peer that mined it may have crashed intermittently and lost the block data. As another example, a griefing attacker could have just written garbage data to the block hash field in the Bitcoin block-commit transaction. Unfortunately, there's no way to tell the difference between a block's (possibly temporary) unavailability and a block's non-existence. So, the miner will need a heuristic to decide when it's good time to start mining, and a good indication of this is when the Stacks block download rate matches the expected Stacks block mining rate (i.e. the node is no longer bootstrapping).

The heuristic is necessarily an educated guess, and this one is pretty simple. But in the future, the miner code itself will be a separate program from the Stacks node, and will be able to query chain state information from both the Stacks node, the Bitcoin node, other 3rd party services, etc. in order to give the user as much of an informed opinion as possible as to when it's safe to try mining. At that point, it'll be out of our hands; all we have to do is make it so the node provides as much useful information to miner software as possible.

psq commented 4 years ago

yes, that make sense. Thank you for the explanation! I re-read sip-001 last night, and noticed the mention about miner crashing. Currently, though, a miner can fall behind by 10-20 stacks block compared to the burn chain, which seems to be propagating fine. In that case, the miner should really stop mining.

More related questions, concerning sequence of events:

miner receives burn block, and from commitments and key registration knows which miner is producing the corresponding stacks block
miner receives the stacks block (shortly?) and checks it, and figures out which txs can be removed from its mempool
miner picks txs to include in next block from its mempool, and sends commitment with planned block hash

(ignoring micro blocks, which has no material bearing on this sequence, just makes it a bit more complex)

Stacks block in #2 is supposed to be produced as quickly as possible by winning miner to avoid coming after subsequent burn block, or is this fine? but maybe no miner will have sent a commit included in next burn block? Burn blocks can come quickly, not close to the average block time you'd expect from current mining difficulty.

3 needs to come after #2 so you don't include a tx again in the subsequent block or that block will be rejected?

Essentially, #1, #2, and #3 happen close together, and then miner waits for next burn block (and streams micro blocks if any)?

Is this somewhat close?

diwakergupta commented 4 years ago

@psq can you verify if this is fixed by #1980 and close if so? Thanks!

psq commented 4 years ago

looks like it is, will reopen if new evidence surfaces

psq commented 3 years ago

This happened to a miner yesterday (1/23/2021) on mainnet

667452 1 667451 1065> 2,466.40 1,871,594 br71 s:779fd90958 p:829027050e c:346020aa35 i:842fa82a86 b:00000000000000000001b104f @+  [0.6]SP105ARDW7 [32.1]SP137NZXD0 [10.7]SP1RJH1641 [11.2]SP25Y6NQXB [1.1]SP2X3CCBFR [9.1]SP334VW9T9 [1.1]SP3PW87SQ8 [32.4]SPP1DJ1JE2*[1.8]SPY1PP2KZ6 
667453 ?    0   - - -  -     
667454 310 667144 811  4,932.80 1,851,594 br73 s:4ad86a13db p:b1e8c3304a c:92a70e4fe6 i:7c6d6595e4 b:00000000000000000001e7d33     [0.6]SP105ARDW7 [32.4]SP137NZXD0 [10.8]SP1RJH1641 [11.3]SP25Y6NQXB*[9.2]SP334VW9T9 [1.1]SP3PW87SQ8 [32.7]SPP1DJ1JE2 [1.8]SPY1PP2KZ6 
667455 286 667169 829  2,466.40 1,871,594 br74 s:ccba66e4e1 p:e554c8725f c:b45d29c76d i:d06a6c21e4 b:0000000000000000000858436     [0.6]SP105ARDW7 [32.1]SP137NZXD0 [10.7]SP1RJH1641 [11.2]SP25Y6NQXB*[1.1]SP2X3CCBFR [9.1]SP334VW9T9 [1.1]SP3PW87SQ8 [32.4]SPP1DJ1JE2 [1.8]SPY1PP2KZ6 
667456 4 667452 1066> 2,466.40 2,081,594 br71 s:611603ed90 p:779fd90958 c:71ce135a73 i:28dfaff611 b:00000000000000000007ae5da     [0.5]SP105ARDW7 [28.8]SP137NZXD0*[9.1]SP1D6NC90K [9.6]SP1RJH1641 [10.1]SP25Y6NQXB [1.0]SP2X3CCBFR [8.2]SP334VW9T9 [1.0]SP3PW87SQ8 [1.0]SP4F3EX8B7 [29.1]SPP1DJ1JE2 [1.6]SPY1PP2KZ6 
667457 1 667456 1067> 2,466.40 2,081,594 br71 s:977f180ebc p:611603ed90 c:aded97047a i:6b00c5312c b:000000000000000000012fc9f @@  [0.5]SP105ARDW7 [28.8]SP137NZXD0 [9.1]SP1D6NC90K [9.6]SP1RJH1641 [10.1]SP25Y6NQXB [1.0]SP2X3CCBFR [8.2]SP334VW9T9 [1.0]SP3PW87SQ8 [1.0]SP4F3EX8B7 [29.1]SPP1DJ1JE2 [1.6]SPY1PP2KZ6*
667458 1 667457 1068> 2,466.40 2,081,594 br71 s:3ce14d0bf7 p:977f180ebc c:4f20c0adcb i:0cb04389db b:00000000000000000003d5334 @@  [0.5]SP105ARDW7 [28.8]SP137NZXD0 [9.1]SP1D6NC90K [9.6]SP1RJH1641 [10.1]SP25Y6NQXB*[1.0]SP2X3CCBFR [8.2]SP334VW9T9 [1.0]SP3PW87SQ8 [1.0]SP4F3EX8B7 [29.1]SPP1DJ1JE2 [1.6]SPY1PP2KZ6

You can can see miner SP25Y6NQXB restarting with mining block 811 (tip is 1065), then block 829, and then being caught up while mining 1068 a few blocks later

diwakergupta commented 3 years ago

Looks like #2364 is a duplicate of this issue. Either way, #2370 should address it by allowing miners to explicitly specify a mine-at-block height as a command-line parameter.

jcnelson commented 3 years ago

In 2.0.3 and later, you can specify a minimum block height at which the miner will try to mine, which should prevent the node from mining too early.

blockstack-devops commented 2 weeks ago

This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

stacks-network / stacks-core

[TESTNET BUG] miner starts mining too early, mining blocks lower than the current tip #1745

3 needs to come after #2 so you don't include a tx again in the subsequent block or that block will be rejected?