Open 0xHansLee opened 2 days ago
Unfortunately, no log was printed when an error occurs in completing unbonding in comsos-sdk. It just skip the failed unbonding and continue to try to complete unbonding. Thus, it was hard to figure out what caused this incident in CompleteUnbonding. Instead, I listed the potential points of error in CompleteUnbonding.
Before listing the possible error points in CompleteUnbonding, I will briefly explain the workflow of CompleteUnbonding.
First, the unbonding delegation is retrieved using the delegator address and validator address. The fetched unbonding delegation contains an array of UnbondingDelegationEntry structures, representing the delegations that the delegator has unbonded from a specific validator. Each entry has a unique ID that increments by 1.
Next, the process iterates over the entries that have matured and are not held in any external module. The unbonding is processed in a for loop. Entries that are completed are removed, and the unbonding index is deleted. The unbonding index is a mapping between the entry ID and the unbonding key, which is composed of the delegator address and validator address. Afterward, the undelegation is processed from the NotBondedPoolName module to the respective account. If no entries remain, the unbonding delegation is deleted. If entries remain, the unbonding delegation is updated.
The potential error points are as follows:
1. GetUnbondingDelegation: Fetch the unbonding delegation.
2. BondDenom: Retrieve bond denom from params.
3. StringToBytes(delegatorAddress) in AddressCodec: Convert the string address to a byte array.
4. DeleteUnbondingIndex: Delete the index for the entry’s UnbondingId from the store.
5. UndelegateCoinsFromModuleToAccount: Errors that occur within the UndelegateCoins function are returned.
6. RemoveUnbondingDelegation: When no unbonding entries remain.
7. SetUnbondingDelegation: When unbonding entries remain.
Description and context
In Sep 26's network halt incident, we uncovered that there may be a mismatch between what we think cosmos would unbond vs what cosmos actually unbonded.
So we need to investigate and list all the scenarios when CompleteUnbonding doesn't unbond an unbending entry. This can help us identify the root cause of the network halt.
Experienced behavior
Some unbonding did not complete properly, which resulted in the spendable amount being less than the unbonding amount and prevented the withdrawal from occurring properly.
Expected behavior
If the unbonding is not complete, the withdrawal corresponding to that unbonding should not be processed.
Solution recommendation
When withdrawing in the
EndBlock
of theevmstaking
module, double check that the corresponding unbonding is complete and only process the withdrawal for completed unbonding.