Open Kailai-Wang opened 1 year ago
I think, in general, we should probably adhere to substrate's convention, which is obviously not yet the case:
Include a trusted operation in the sidechain block even if it failed, but add an extrinsic failed event to the system pallet on the sidechain and hence include it in the statediff.
If our design allows that, I would prefer that we return Err()
for failed trusted operations, but the executor (or better a layer in between) catches that and handles the error case.
However, there is one misconception, we do not propagate the transaction pool yet, see https://github.com/integritee-network/worker/issues/616#issuecomment-1022090746. Hence, the nonce-handling issue in the demo_shielding_unshielding.sh
has probably nothing to do with that.
but the executor (or better a layer in between) catches that and handles the error case.
Yea if we manage to do that it would be great.
The problem is the trustedCall
is a superset of (potentially multiple) sgx-runtime-extrinsics, where some extrinsics succeed, some can fail. But statediff is applied on a trustedCall
basis (If I understand it correctly).
we do not propagate the transaction pool yet
With "propagate" do you mean tx pool gossip? Like synchronise the tx pool between peers? But my understanding is:
Err()
will not be added to the sidechain block. Still, the statediff regarding nonce will be included in the sidechain block (as we incremented it before sgx-runtime-extrinsic execution). Parentchain block importer is independently processed in each worker, so whenever another worker finds the same indirect call, and because the corresponding top was not removed, it will try to execute it again and find that the nonce is unexpected.Not sure if I miss something though
I just saw the title change by @brenzi , which I think makes sense.
In fact, we probably need to differentiate the handling between indirect and direct calls:
Err()
for synchronous subscription to the state listener.Ok()
in every case, which is not so ideal.So it would be great if we could have a unified solution for both cases.
I am not convinced that the handling should be different for both cases. Once we gossip the trusted operation pool to fellow validateers they also need to know when a trusted operation has been tried to execute. So I hope that we can just do the same in both cases:
I gave it a second thought, yes I believe in both cases the unsuccessfully executed top hash should be included in the block.
A follow-up question would be: should we remove it from the top pool? The current implementation is:
However, imagine block composition eventually fails, then in step2: a. if we remove it from the pool, it won't be picked up next time as it's already left out b. if we keep it in the pool, it will be re-executed until a sidechain block is successfully produced
I kind of feel imperfect in either way, it's partially because STF execution is broader than mere sgx-runtime transactions, you can add pre/post steps before/after the runtime tx to e.g. communicate with the parachain.
Edit: I believe option a) is still better, which means if it fails to execute this time, then that's it: a failure is a failure. We shouldn't try to execute it repeatedly. If the user wants to retry, then another request is needed.
To summarise, the path should be:
I'm trying to raise the discussion here and I'd appreciate any feedback :)
In our opinion we should always return
Ok()
for the trusted_call execution based on the current implementation, rationale: