mojaloop / project

Repo to track product development issues for the Mojaloop project.
Other
23 stars 15 forks source link

[Helm] GP Tests failing on first run when upgrading from v14.1.1 to v15 due to WS issues between TTK and SDKs #3164

Open mdebarros opened 1 year ago

mdebarros commented 1 year ago

Summary:

GP Tests fail intermitantly when upgrading a release from v14.1.1 to v15 due to WS issues between TTK and SDK's Mojaloop Simulators.

Refer to the following GP Test Report.

This only impacts the Mojaloop Simulator SDK-Scheme-Adapter component (specifically the TEST API using WebSockets), and thus why the GP Tests fail assertions.

The issue seems to be related to a connectivity issue being caused by the "restarting" of the SDK-Scheme-Adapter components during the upgrade process.

NOTE: It should not impact any "live" transactions through the system.

Severity: Low

Priority: Medium

Expected Behavior

GP Tests should pass with 100% assertion checks.

Steps to Reproduce

  1. Deplpy Mojaloop v14.1.1
  2. Upgrade Mojaloop v14.1.1 to v15
  3. Execute Helm Tests

Specifications

Notes:


Important Update on 2023-05-23:

This issue can also occur in general, i.e. not only when the environment has been upgraded.

E.g. --> https://mojaloop.slack.com/archives/CG3MAJZ5J/p1684839469352009

mdebarros commented 1 year ago

Investigation

Findings

Looking at the failed Active and inactive participant GP Test Collection, one can see that ALL assertions fail that require requests or callbacks to be "collected" by a WS Notification mechanism between the TTK and the Simulator SDK-Scheme-Adapters.

Taking the failing Quote requests, one can see that the Quote was successful by looking at the quoting-service, sim-testfsp1-scheme-adapter, and sim-testfsp2-scheme-adapter logs:

  1. Quote Requests and Callbacks can be seen in the quoting-service and the sim-testfsp1-scheme-adapter logs
  2. Quote request being processed with Callbacks being sent as a response can be seen in the sim-testfsp2-scheme-adapter logs.

Thus I can only conclude that there is an issue with the WS client/server connectivity between the TTK and the Simulator's SDK-Sheme-Adapter.

Artifacts:

Work Around

Restarting the Simulator SDK-Scheme-Adapters resolves the WS connectivity issue between the TTK and the Simulator SDK-Scheme-Adapters, thereby allowing the GP Tests to pass with 100% assertions.

elnyry-sam-k commented 1 year ago

Not high-priority, removing from this Sprint; to be re-prioritized at a later time if there is a repro

mdebarros commented 1 year ago

Important Update on 2023-05-23

This issue can also occur in general, i.e. not only when the environment has been upgraded.

E.g. --> https://mojaloop.slack.com/archives/CG3MAJZ5J/p1684839469352009

mdebarros commented 10 months ago

Important Update on 2023-10-30

A temporary work-around has been attempted in Mojaloop Helm v15.2.0-rc release tracked by this issue --> https://github.com/mojaloop/project/issues/3597.

The Test-cases that show this issue have been temporarily switched to HTTP API calls instead of WS subscribers.

The WS has been verified as a result, and is most likely being caused due to stale connections that are not properly being recycled (i.e. reconnected on failures, etc). This is a result of the WS v18.x lib changes that were introduced, where Mojaloop Core services will need to include a "health ping" capability to properly ensure this is happening.