Closed davxy closed 1 year ago
Polkadot uses a custom finality voting rule and select_chain
implementation which is quite crucial not to break: https://github.com/paritytech/polkadot/blob/master/node/service/src/relay_chain_selection.rs and https://github.com/paritytech/polkadot/blob/master/node/core/chain-selection/src/lib.rs
https://github.com/paritytech/substrate/blob/248fdf0d4b5e3758cfdadb283b5eca5f0731e466/client/finality-grandpa/src/environment.rs#L1230 I don't fully understand why this check is required. If there is a maximum allowed block number then the SelectChain
trait should accept this block number as a parameter to finality_target
.
Obviously we have to prevent GRANDPA to vote for a chain that is not the best according to our node.
Not exactly sure what you mean by this. The SelectChain
implementation is also part of the node. Which source of information is actually correct within the node and how do you know that for certain?
The finality_target should choose directly a chain that includes our best, thus should pick the longest chain that: contains the base AND the best
I think this in particular would break Polkadot. Forcing a vote on anything >= than the minimum possible block is just dead wrong and a bad API for the SelectChain
. In particular, it gives us no option to actually select a chain...
In Polkadot we are happy to build on top of forks which contain un-approved parachain candidates but we definitely 100% don't want to finalize them. This is just one example. But the API needs to support such use-cases.
SelectChain
is a very consensus critical interface.
SelectChain
Maybe I was not super clear describing the issue.
If I have to synthesize the issue in one sentence I would say: "the current way to choose the target (SelectChain::finality_target()
) doesn't take in consideration what is the chain chosen by SelectChain::best_chain()
"
The two functions may return blocks in two separate forks, we compare their heights and bail out if best > target.
IMO the hash returned by finality_target
should be in the same chain of best_chain
(or can we vote for a chain that is not the best according to best_chain()
?) As far as I understood this is not correct.
Obviously we have to prevent GRANDPA to vote for a chain that is not the best according to our node.
Not exactly sure what you mean by this. The SelectChain implementation is also part of the node. Which source of information is actually correct within the node and how do you know that for certain?
With "best according to our node" I was referring to the value returned by SelectChain::best_chain()
,
The reason we call SelectChain::best_chain
in GRANDPA is just to pass it along to any VotingRule
that might want to further restrict the votes (e.g. we have a voting rule to constrain votes by N blocks from best). I think we should change the API such that in a single call we can get both the finality target and the best block (this also helps with a possible race condition of making two asynchronous calls separately). The semantics should stay the same and the voter should follow the decision to finalize the block that was yielded by SelectChain
, but I think it's fair for the API to expect that both blocks are congruent, i.e. the finality target should be an ancestor of the best block (or the same block).
@andresilva I elaborated our discussion and if here our requirements are:
target.number
<= best.number
target
is in the same chain of best
(according to the SelectChain::best_chain
)Can't we just start our finality target search from SelectChain::best_chain
and eventually get one of its ancestors if there is a limit to respect? Obviously the lower bound is the passed block
(aka the base) as before...
In short, requisite 2 above limits our choices to this chain only: base -> ..... -> best
if I've not overlooked some detail, probably makes no sense to iterate over all the leaves.
target is in the same chain of best (according to the SelectChain::best_chain)
Is this actually achievable?
AFAIK target
is set either as the last finalized block or is an overestimate of what could be finalized. This by definition can be conflicting with BABE or other chain selection rules.
We may require that target
is in the same chain as best
, but it should be done by changing the opinion of best
(i.e. "breaking" BABE) rather than changing the definition of target
(i.e. "breaking" GRANDPA)
It is achievable when the user decides to use LongestChain
implementation of SelectChain
(i.e. the "default" one that ships with Substrate).
Current authoring algorithms are using the same SelectChain
that GRANDPA is using here.
They write what is their best
in the backend (META
column), this info is then fetched using self.backend.blockchain().info()
in the LongestChain
methods implementation.
This strategy is not something written in stone, but depends on the SelectChain
implementation the user decides to use (e.g. Polkadot, as you know doesn't use LongestChain
as is).
PR https://github.com/paritytech/substrate/pull/13289 is actually using this information to get a finalization target in the same chain of the best
Can't we just start our finality target search from
SelectChain::best_chain
and eventually get one of its ancestors if there is a limit to respect? Obviously the lower bound is the passedblock
(aka the base) as before...
The linked pr is still missing the removal of best_chain
then. Otherwise we can run into issues when there happens a reorg between calling finality_target
and best_chain
as already explained by rob.
May happen that THIS GRANDPA invariant constraining a vote target to be less than the current best is violated.
The bug triggers quite often so is worth fixing it ASAP
Analysis:
block
(aka "base") passed as parametertarget
is selected usingselect_chain.finality_target(base, None)
here. Starting from the leaves this will pick the block that is farther from the base (optionally less than a max number, None is passed)best
is selected usingselect_chain.best_chain()
that , according to theLongestChain
implementation here then here it returns the best descendant of the best block (sorry for the recursion... π) according to theMETA
info stored by the backend. In practice this contains a descendant of the best according to BABE primary/secondary block rule (not always corresponding to the longest chain).I have a locked up DB (i.e. every time I restart the validator this terminates immediately) like the following:
When I start the node the following error is then returned and grandpa terminates:
Potential solution
Obviously we have to prevent GRANDPA to vote for a chain that is not the best according to our node.
The
finality_target
should choose directly a chain that includes our best, thus should pick the longest chain that: contains the base AND the best