Testing service stops when encountering dispatch error

ebma commented 9 months ago

At the moment, the testing service stops all tests because of an uncaught error. The problem is that one of the vaults is currently banned because of a canceled redeem request. Thus, the issue request fails. But the testing service does not continue afterward.

See the following logs:

Connecting to node wss://rpc-foucoco.pendulumchain.tech...
Connected to node wss://rpc-foucoco.pendulumchain.tech
Connecting to node wss://rpc-amplitude.pendulumchain.tech...
Connected to node wss://rpc-amplitude.pendulumchain.tech
Testing the issuance of vault 6mMEQttPWJF46Y77pks4fhjNPrTuwmzdawfWK2qJC2bP5hDR { XCM(0) - USDC } on network foucoco
Testing the issuance of vault 6mb5AuwinTp5BJ8hB2A2a71U2ZXu2jc4GBckuvp85Vy5nb1s { XCM(0) - USDC } on network amplitude
Testing the issuance of vault 6n34uuJz9raainPkGfrVgrmj1Cb3Uxh8zbesCRYCz17GQCgJ { XCM(0) - USDC } on network amplitude
Testing the issuance of vault 6jBbDhy4DR3WEm2kbvEZpyW1xSf3crsNhTnmmVFyLHkBbgVz { XCM(0) - XLM } on network amplitude
Testing the issuance of vault 6iEjSzEPWvBkpFCLY4LHnwSDCYnwjvVvCH8MJkWHTfoEXMox { XCM(0) - BRL } on network amplitude
Testing the issuance of vault 6mpGcpNdmKttMkQG42VGhq1fU2gofz4uXA3XeoPuNtrnqZLM { XCM(0) - TZS } on network amplitude
Requesting issue of 10000000000 for vault 6mMEQttPWJF46Y77pks4fhjNPrTuwmzdawfWK2qJC2bP5hDR { XCM(0) - USDC }
Requesting issue of 10000000000 for vault 6mb5AuwinTp5BJ8hB2A2a71U2ZXu2jc4GBckuvp85Vy5nb1s { XCM(0) - USDC }
Requesting issue of 10000000000 for vault 6n34uuJz9raainPkGfrVgrmj1Cb3Uxh8zbesCRYCz17GQCgJ { XCM(0) - USDC }
Requesting issue of 10000000000 for vault 6jBbDhy4DR3WEm2kbvEZpyW1xSf3crsNhTnmmVFyLHkBbgVz { XCM(0) - XLM }
Requesting issue of 10000000000 for vault 6iEjSzEPWvBkpFCLY4LHnwSDCYnwjvVvCH8MJkWHTfoEXMox { XCM(0) - BRL }
Requesting issue of 10000000000 for vault 6mpGcpNdmKttMkQG42VGhq1fU2gofz4uXA3XeoPuNtrnqZLM { XCM(0) - TZS }
Server Running on Port 5000
Requested issue of 10000000000 for vault 6mMEQttPWJF46Y77pks4fhjNPrTuwmzdawfWK2qJC2bP5hDR { XCM(0) - USDC } with status Finalized
Successfully posed issue request 0xf4c45aedb7151db0504b0f572209199b6da57e7edb8d867cdc0778dd4e76a880 for vault 6mMEQttPWJF46Y77pks4fhjNPrTuwmzdawfWK2qJC2bP5hDR { XCM(0) - USDC } on network foucoco
Sending 0.01 USDC to GB42WOIOJUQPLEXSPO3HQPGOD4TU4JXSN4D27544NCK6JZWJ2CVGTLFY with memo HUU8sm6zohcjhtsjZ7fLHyUBytme
StellarAccountError: The Stellar account GB42WOIOJUQPLEXSPO3HQPGOD4TU4JXSN4D27544NCK6JZWJ2CVGTLFY does not exist!
    at StellarService.loadAccount (file:///app/spacewalk-testing-service/dist/stellar_service/stellar.js:110:23)
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
    at async StellarService.transfer (file:///app/spacewalk-testing-service/dist/stellar_service/stellar.js:27:9)
    at async Test.testIssuance (file:///app/spacewalk-testing-service/dist/test/test.js:57:9) {
  account: 'GB42WOIOJUQPLEXSPO3HQPGOD4TU4JXSN4D27544NCK6JZWJ2CVGTLFY'
}
Requested issue of 10000000000 for vault 6mpGcpNdmKttMkQG42VGhq1fU2gofz4uXA3XeoPuNtrnqZLM { XCM(0) - TZS } with status Finalized
Successfully posed issue request 0x48c24ab5f7eeb09b49837cde04d54a911d2c0b78afa05e364c5246b379a2ebee for vault 6mpGcpNdmKttMkQG42VGhq1fU2gofz4uXA3XeoPuNtrnqZLM { XCM(0) - TZS } on network amplitude
Sending 0.01 TZS to GD7Y5F3NSZWJZ553PZ5ACAQGDQ4BYKUXYIBIRDECPNO2VBDHUBYCWKLU with memo 5u2BqpZLZZydw8XrutbYMcNuCLWj
Requested issue of 10000000000 for vault 6mb5AuwinTp5BJ8hB2A2a71U2ZXu2jc4GBckuvp85Vy5nb1s { XCM(0) - USDC } with status Finalized
Successfully posed issue request 0xae42cd0a4aa14a7c1ba4f4a1b712a567abd4609a198f094edc0791b42946cea1 for vault 6mb5AuwinTp5BJ8hB2A2a71U2ZXu2jc4GBckuvp85Vy5nb1s { XCM(0) - USDC } on network amplitude
Sending 0.01 USDC to GCODYRWTVB5EXKLAU2COTFWFI5AU7FAQ7BWJVRWP4XBCVDB6AMNFUZGR with memo CjF25EMvGB9biy91d3qLMnFysJmY
Requested issue of 10000000000 for vault 6n34uuJz9raainPkGfrVgrmj1Cb3Uxh8zbesCRYCz17GQCgJ { XCM(0) - USDC } with status Finalized
Successfully posed issue request 0xa00e944989676b10a5edc0fbe4f37c57e703bad9a1aad91fe9bb31a0e1dd9378 for vault 6n34uuJz9raainPkGfrVgrmj1Cb3Uxh8zbesCRYCz17GQCgJ { XCM(0) - USDC } on network amplitude
Sending 0.01 USDC to GAH34W7R5EOCVL3Q2PITD6VQ5QASVLY5CFNJ6JQJXR3CYRI7XWF4FT7Y with memo Bmo8HXRZoPCDNiP1rzUfiHQw93Nn
Requested issue of 10000000000 for vault 6jBbDhy4DR3WEm2kbvEZpyW1xSf3crsNhTnmmVFyLHkBbgVz { XCM(0) - XLM } with status Finalized
Requested issue of 10000000000 for vault 6iEjSzEPWvBkpFCLY4LHnwSDCYnwjvVvCH8MJkWHTfoEXMox { XCM(0) - BRL } with status Finalized
Successfully posed issue request 0x6f910e532ed50cd90f38b2d66cc85825832a3cbd6edcf040f74a145bb758e16c for vault 6iEjSzEPWvBkpFCLY4LHnwSDCYnwjvVvCH8MJkWHTfoEXMox { XCM(0) - BRL } on network amplitude
Sending 0.01 BRL to GAKTH3GTKDKGPRXNHBFEQ5R5MVAOFUYDWYHJZUVWJNA22ESERQBQ7IGV with memo 8WWYUZQaxrgPh5Cye4vMDLsMzpf6
TestDispatchError: Dispatch Error
    at VaultService.handleDispatchError (file:///app/spacewalk-testing-service/dist/vault_service/vault.js:103:20)
    at file:///app/spacewalk-testing-service/dist/vault_service/vault.js:29:44
    at file:///app/spacewalk-testing-service/node_modules/@polkadot/api/promise/decorateMethod.js:56:28
    at file:///app/spacewalk-testing-service/node_modules/@polkadot/util/nextTick.js:13:13
    at runNextTicks (node:internal/process/task_queues:60:5)
    at listOnTimeout (node:internal/timers:540:9)
    at process.processTimers (node:internal/timers:514:7) {
  section: 'vaultRegistry',
  method: 'VaultBanned',
  extrinsicCalled: 'Issue Request'
}

TODO

We need to make sure that the testing service continues with the other tests in case something goes wrong in one test.

ebma commented 9 months ago

@pendulum-chain/product fixing this should also get higher priority though it will probably disappear once the XLM vault gets unbanned again.

annatekl commented 9 months ago

Hey team! Please add your planning poker estimate with Zenhub @b-yap @bogdanS98 @ebma @gianfra-t @TorstenStueber

gianfra-t commented 9 months ago

@ebma do you think we should do something like recoverable/unrecoverable error detection? Or just send an incident message and stop that particular test until manual input?

ebma commented 9 months ago

I think everything is fine except that for some reason the dispatch error that is thrown in this case is impacting the execution of the other test cases as well. It's not easy to see but the logs in the description are basically the end of the execution. You would except some messages about 'Successfully sent ... to ...' happening for the other pending tests but that's not the case. Everything just stops until the next round of tests starts.

Is it clearer now?

annatekl commented 9 months ago

@ebma does this require a runtime upgrade?

annatekl commented 9 months ago

@ebma what is this ticket about exactly @ebma? Investigating the issue mentioned in the title or implementing a fix?

ebma commented 9 months ago

No it does not require a runtime upgrade and it's about implementing the fix for the problem described in this ticket.

gianfra-t commented 9 months ago

Did we find this problem multiple times? I tried to reproduce this locally with one vault forced to fail with "Banned" error, but the execution continues for the other vault. Could it be that the vaults did not completed the request and so it looked like the execution stopped ?

TorstenStueber commented 9 months ago

@gianfra-t I can't help here, have to wait for the other team members to return.

ebma commented 9 months ago

@gianfra-t did you also get the

TestDispatchError: Dispatch Error
    at VaultService.handleDispatchError (file:///app/spacewalk-testing-service/dist/vault_service/vault.js:103:20)
    at file:///app/spacewalk-testing-service/dist/vault_service/vault.js:29:44
    at file:///app/spacewalk-testing-service/node_modules/@polkadot/api/promise/decorateMethod.js:56:28
    at file:///app/spacewalk-testing-service/node_modules/@polkadot/util/nextTick.js:13:13
    at runNextTicks (node:internal/process/task_queues:60:5)
    at listOnTimeout (node:internal/timers:540:9)
    at process.processTimers (node:internal/timers:514:7) {
  section: 'vaultRegistry',
  method: 'VaultBanned',
  extrinsicCalled: 'Issue Request'
}

when reproducing this issue and the other requests still completed successfully? If so, maybe you are right and it just appeared to have stopped while still working as expected.

gianfra-t commented 9 months ago

Yes! I made it fail with that error type and it continued executing the other issues. The test I made was with both vaults on the same chain. But strange, because also looking at the code I wouldn't know the mechanism by which this could happen with this particular dispatch error since they are all treated the same. Let me know if you have any other ideas on how we could try to test this.

ebma commented 9 months ago

Maybe this was again related to a locking issue? If we return here, are we still releasing the lock in the finally block here? I guess we don't but I'm not 100% sure.

gianfra-t commented 9 months ago

I think we do release the lock, because although we are returning from the current function, the "inner" promise with the catch().finally() must end somehow. Also, it would be failing for any dispatchError right? That's what confuses me about this error. But even if it did not release it, we would have seen that this service stop but didn't recover on the next "round", and in theory the other network should not be affected because there is one lock for each network.

gianfra-t commented 8 months ago

So I tested again in the production chains, with the current config, while making one of the vaults "fail" with:

if (
      dispatchError ||
      this.vaultId.accountId ==
        "6mb5AuwinTp5BJ8hB2A2a71U2ZXu2jc4GBckuvp85Vy5nb1s"
    ) {
              return reject(
                this.handleDispatchError(
                  dispatchError,
                  systemExtrinsicFailedEvent,
                  "Issue Request",
                ),
              );
            }

which is equivalent to getting any dispatch error, and I get the following logs:

It would seem that even with an error on one of the Amplitude vaults, the service continues!

ebma commented 8 months ago

I see, thanks for testing this thoroughly again @gianfra-t! I wouldn't know any other way to reproduce the issue I reported so let's consider this fixed for now and close this ticket. If it happens again in the future, we can still reopen it.

pendulum-chain / spacewalk-testing-service

Testing service stops when encountering dispatch error #6

TODO