Closed ebma closed 8 months ago
@pendulum-chain/product fixing this should also get higher priority though it will probably disappear once the XLM vault gets unbanned again.
Hey team! Please add your planning poker estimate with Zenhub @b-yap @bogdanS98 @ebma @gianfra-t @TorstenStueber
@ebma do you think we should do something like recoverable/unrecoverable error detection? Or just send an incident message and stop that particular test until manual input?
I think everything is fine except that for some reason the dispatch error that is thrown in this case is impacting the execution of the other test cases as well. It's not easy to see but the logs in the description are basically the end of the execution. You would except some messages about 'Successfully sent ... to ...' happening for the other pending tests but that's not the case. Everything just stops until the next round of tests starts.
Is it clearer now?
@ebma does this require a runtime upgrade?
@ebma what is this ticket about exactly @ebma? Investigating the issue mentioned in the title or implementing a fix?
No it does not require a runtime upgrade and it's about implementing the fix for the problem described in this ticket.
Did we find this problem multiple times? I tried to reproduce this locally with one vault forced to fail with "Banned" error, but the execution continues for the other vault. Could it be that the vaults did not completed the request and so it looked like the execution stopped ?
@gianfra-t I can't help here, have to wait for the other team members to return.
@gianfra-t did you also get the
TestDispatchError: Dispatch Error
at VaultService.handleDispatchError (file:///app/spacewalk-testing-service/dist/vault_service/vault.js:103:20)
at file:///app/spacewalk-testing-service/dist/vault_service/vault.js:29:44
at file:///app/spacewalk-testing-service/node_modules/@polkadot/api/promise/decorateMethod.js:56:28
at file:///app/spacewalk-testing-service/node_modules/@polkadot/util/nextTick.js:13:13
at runNextTicks (node:internal/process/task_queues:60:5)
at listOnTimeout (node:internal/timers:540:9)
at process.processTimers (node:internal/timers:514:7) {
section: 'vaultRegistry',
method: 'VaultBanned',
extrinsicCalled: 'Issue Request'
}
when reproducing this issue and the other requests still completed successfully? If so, maybe you are right and it just appeared to have stopped while still working as expected.
Yes! I made it fail with that error type and it continued executing the other issues. The test I made was with both vaults on the same chain. But strange, because also looking at the code I wouldn't know the mechanism by which this could happen with this particular dispatch error since they are all treated the same. Let me know if you have any other ideas on how we could try to test this.
I think we do release the lock, because although we are returning from the current function, the "inner" promise with the catch().finally()
must end somehow.
Also, it would be failing for any dispatchError
right? That's what confuses me about this error. But even if it did not release it, we would have seen that this service stop but didn't recover on the next "round", and in theory the other network should not be affected because there is one lock for each network.
So I tested again in the production chains, with the current config, while making one of the vaults "fail" with:
if (
dispatchError ||
this.vaultId.accountId ==
"6mb5AuwinTp5BJ8hB2A2a71U2ZXu2jc4GBckuvp85Vy5nb1s"
) {
return reject(
this.handleDispatchError(
dispatchError,
systemExtrinsicFailedEvent,
"Issue Request",
),
);
}
which is equivalent to getting any dispatch error, and I get the following logs:
It would seem that even with an error on one of the Amplitude vaults, the service continues!
I see, thanks for testing this thoroughly again @gianfra-t! I wouldn't know any other way to reproduce the issue I reported so let's consider this fixed for now and close this ticket. If it happens again in the future, we can still reopen it.
At the moment, the testing service stops all tests because of an uncaught error. The problem is that one of the vaults is currently banned because of a canceled redeem request. Thus, the issue request fails. But the testing service does not continue afterward.
See the following logs:
TODO
We need to make sure that the testing service continues with the other tests in case something goes wrong in one test.