Open liuchengxu opened 1 year ago
Do you have forks? Could it be that you sync some other fork? So let's say you last finalized 512 - 256
and now try to finalize 511 - 256
because you import some fork.
In general you could also just add a simple:
let new_finalized = block_number - 256;
if new_finalized > client.info().last_finalized {
finalize(new_finalize);
}
In general this looks like some bug in your code and not in the Substrate code as long as you can not convince me of the opposite ;)
Thanks @bkchr for the suggestion, Checking new_finalized_block_number > last_finalized_block_number
is helpful.
However, I think there is still a chance to make Substrate generally more robust as the raised error message is confusing to me.
I reproduced this issue locally. To illustrate, K = 3, assuming the blocks are produced as follows, 8 -> 9 > 10a -> 10b -> 11, now block 8 is finalized.
10a
/
8 -- 9
\
10b -- 11a
Produce a new block 11b on top of 10a, the chain reorgs from 11a to 11b,
10a -- 11b
/
8 -- 9
\
10b -- 11a
the common block is 9, exacted blocks are [10a, 11b]. When 10a is imported on top of 9 again, it attempts to finalize block 7 (_assuming we don't check new_finalized_block_number
> last_finalized_block_number
_)
10a -- 11b
/
7 -- 8 -- 9
\
10b -- 11a
last_finalized
is 8, block
is 7, and the calculated route_from_finalized
will have an empty enacted list and the retracted list [8], indicating that block 7 is already finalized as part of the finalized chain. That being said, we can safely do a no-op here if enacted
is empty, we may raise a warning message instead of returning an error.
diff --git a/client/service/src/client/client.rs b/client/service/src/client/client.rs
index 91c59cdbac..ae1f0262cf 100644
--- a/client/service/src/client/client.rs
+++ b/client/service/src/client/client.rs
@@ -927,6 +927,14 @@ where
sp_blockchain::tree_route(self.backend.blockchain(), last_finalized, block)?;
if let Some(retracted) = route_from_finalized.retracted().get(0) {
+ if route_from_finalized.enacted().is_empty() {
+ warn!(
+ "Attempted to re-finalize an older finalized block {block:?}, \
+ last_finalized: {last_finalized}",
+ );
+ return Ok(())
+ }
+
warn!(
"Safety violation: attempted to revert finalized block {:?} which is not in the \
same chain as last finalized {:?}",
I can send a PR if it makes sense.
Improving the error message is fine. However, I'm not sure about not returning an error. I see your reasoning, but we print a message as someone misused the api. So, if we print a message we should also return an error?
CC @davxy @andresilva
For sure error message could be improved, as it is not super clear. Maybe with something like : "Attempting to finalize block {hash} {number} which is not in the same chain as last finalized {last-fin-hash} {last-fin-number}."
Said that, in theory calling this function against something that has already been finalized (i.e. an ancestor of info.finalized_hash
or in other words route_from_finalized.enacted().is_empty() == true
) should be fine.
We already doing something similar just above (here) if we re-finalize the last finalized.
BUT there are some assumptions around and this modification may have some undesired side effects at the moment. All code paths should be analyzed.
For example here we are setting this block as new_best
(if import_existing
is true).
And even if this has no consequences in the backend (to be checked) we end up having the ImportSummary::is_new_nest == true
here.
This may have consequences or not. The code path should be carefully analyzed.
(actually your code doesn't call apply_finality_with_block_hash
through this path, but via finalize_block
, but the reasoning above still apply).
In conclusion, I think that an innocent tweak like this is correct but may have repercussions somewhere else and thus you should eventually carefully check it
Not sure how long it will take to apply this tweak, in order to help the community not be trapped here, we can at least improve the docs to finalize_block
to mention that the caller is responsible for not re-finalizing the older finalized block even if it's intuitively harmless.
I given had a better look and apply_finality_with_block_hash
is called only by 2 places:
import_block
(if import params finalized = true
and import_existing = true
)finalize_block
If we return Ok
in case that route_from_finalized.enacted().is_empty()
(in other words finalizing something already finalized):
In both cases we don't push anything new into the ClientImportOperation
, so per-se is a no-op.
BUT
In case 1 even if the call is successful this will then fail in the backend when we try to commit the operation (Error::NotInFinalizedChain
=> "Potential long-range attack: block not in finalized chain").
In case 2 the call ends-up being successful and looks harmless. But, as I said, we have to keep an eye for the places where this call is performed to not disrupt some assumption. For example in grandpa
this may have consequences that should eventually be analyzed:
https://github.com/paritytech/substrate/blob/297b3948f4a0f7f6504d4b654e16cb5d9201e523/client/consensus/grandpa/src/environment.rs#L1422-L1423
(e.g. update_best_justification
just below doesn't look to perform any check and the prev best justification may end up being overwritten)
@andresilva any consideration about this and the proposal of not failing?
Is there an existing issue?
Experiencing problems? Have you tried our Stack Exchange first?
Description of bug
The error message indicates it was trying to revert the same finalized block which does not make sense to me. Let me know if I can provide more info.
BTW, we finalize the block at K-depth, specifically, when a new block at height N is imported, we attempt to finalize the block at height N-256.
Steps to reproduce
This happens on Subspace gemini-3d testnet, I guess running the fully-synced domain nodes on gemini-3d may reproduce it, but it's not trivial as the testnet is already over 1M blocks :(