pendulum-chain / spacewalk

Apache License 2.0
34 stars 7 forks source link

Change checks for unknown validators in Stellar relay pallet #544

Closed ebma closed 5 days ago

ebma commented 1 month ago

Context

Error:

2024-08-01T08:39:02.994580Z ERROR vault::requests::execution: Failed to execute Redeem request #0x1c67…64ab because of error: RuntimeError(SubxtRuntimeError(Runtime(Module(ModuleError { pallet: "StellarRelay", error: "EnvelopeSignedByUnknownValidator", description: [], error_data: ModuleErrorData { pallet_index: 68, error: [5, 0, 0, 0] } }))))

The stellar relay pallet currently rejects the whole 'proof' if one of the passed SCP envelopes was signed by an unknown validator or if any other related check fails, see here. However, this check is a little too strict as it might happen that a set of envelopes can still be used to build a valid proof, even if some of the contained envelopes are 'invalid'.

TODO

Instead of returning an error, we should rather change the logic so that the validate_envelopes function checks each envelope for validity but only returns the valid envelopes. These valid envelopes can then be passed to the check_for_valid_quorum_set function to continue with the existing validity checks.

ebma commented 1 month ago

@pendulum-chain/product this ticket loosens some requirements on the Stellar relay pallet. This is necessary because we are currently seeing some proofs failing on Amplitude which are likely to be fixed by this change. We will try to get this change included in the next runtime upgrade for polkadot-v1.1.0.

prayagd commented 1 month ago

Hey team! Please add your planning poker estimate with Zenhub @b-yap @bogdanS98 @ebma @gianfra-t @TorstenStueber

TorstenStueber commented 1 month ago

@ebma do you have an example of such an "invalid" validator? Do we understand why that happens?

ebma commented 1 month ago

@TorstenStueber it currently happens on the liquidated Amplitude KSM/XLM vault. There is still a pending redeem request that can be executed but it is not able to submit the proof because it gets rejected by that error. We already tried switching from the SatoshiPay archive to the SDF archive but it didn't make a difference. This time, I didn't go through the whole set of SCP messages in that slot but assumed that for some reason, the archived messages on that Slot contain one message that is from an unknown validator and the rest are valid.

I can investigate it in more detail to perfectly pin it down, but I think the changes described in this ticket are nice to have anyway because the requirements are unnecessarily strict. Or do you think we shouldn't change it?

TorstenStueber commented 1 month ago

No, I was just curious. I wondered whether it is related to some partial Tier 1 quorum definition change again that we were not aware of.

Reasoning: this pending ticket about Tier 1 automation still requires more thought but I postponed that in favor of prioritizing the prototype. I wanted to understand whether it could become more important again.

ebma commented 1 month ago

I checked the slot (46549248) in more detail again and realized that it's still the same problem that I documented already a while ago. The respective redeem request was created a very long time ago and thus the slot is very old too. And the archived messages for that slot still contain the old LOBSTR2 validator which is why the stellar-relay pallet cannot derive valid consensus since we already registered the new LOBSTR2 public key on chain. See this Notion page.

TorstenStueber commented 1 month ago

Okay, then this issue here makes a lot of sense. Thanks!

prayagd commented 2 weeks ago

@b-yap moving this to development, as i see a PR linked. Please move back if not expected.