ACA-Py tests failing in AATH - Investigate

openwallet-foundation / acapy

Hyperledger Aries Cloud Agent Python (ACA-Py) is a foundation for building decentralized identity applications and services running in non-mobile environments.

https://wiki.hyperledger.org/display/aries

Apache License 2.0

412 stars 512 forks source link

ACA-Py tests failing in AATH - Investigate #2264

Closed swcurran closed 1 year ago

swcurran commented 1 year ago

For the last two runs of AATH, a number of tests are failing that had been working before. Please investigate the runs to determine the source of the failures and fix what is needed to address the problem (ACA-Py, ACA-Py backchannel, the tests, etc.).

usingtechnology commented 1 year ago

looks like a platform issue with github, all those fails (and there is a success in the middle of those runs) are docker image build fails. not that the code can't be built, but that github docker image registry or whatever is struggling.

swcurran commented 1 year ago

I don’t think so. I think it is a problem with the ACA-Py backchannel and the change to the thread ID. Take a look at this page that shows just the assertion error from the test that failed: https://allure.vonx.io/allure-docker-service-ui/projects/acapy-aip10/reports/latest

Looking further, but I’m guessing it is something with that change.

swcurran commented 1 year ago

I’ll run the tests locally and let you know what I find. Verify consistent results, and then if so, I’ll check a prior commit.

usingtechnology commented 1 year ago

One thing in common with all the latest failed runs is dotnet. I am just looking at the actions in AATH

usingtechnology commented 1 year ago

in docker action: test-harness-findy-javascript-dotnet

Features/BasicMessage/IBasicMessageService.cs(4,50): error CS1514: { expected [/aries-framework-dotnet/src/Hyperledger.Aries/Hyperledger.Aries.csproj]
Features/BasicMessage/IBasicMessageService.cs(19,2): error CS1513: } expected [/aries-framework-dotnet/src/Hyperledger.Aries/Hyperledger.Aries.csproj]
The command '/bin/sh -c dotnet publish "DotNet.Backchannel.Master.csproj" -c Release -o /app/publish' returned a non-zero code: 1
Docker image build failed.

locally running AATH : ./manage build -a acapy -a dotnet

#15 3.013 Features/BasicMessage/IBasicMessageService.cs(4,50): error CS1514: { expected [/aries-framework-dotnet/src/Hyperledger.Aries/Hyperledger.Aries.csproj]
#15 3.013 Features/BasicMessage/IBasicMessageService.cs(19,2): error CS1513: } expected [/aries-framework-dotnet/src/Hyperledger.Aries/Hyperledger.Aries.csproj]
------
executor failed running [/bin/sh -c dotnet publish "DotNet.Backchannel.Master.csproj" -c Release -o /app/publish]: exit code: 1
Docker image build failed.

usingtechnology commented 1 year ago

So the dotnet tests/actions have not been successful since they added BasicMessageService - 2 months ago.

I don't know what those Allure dashboards tests are and they look like a completely different level of detail so maybe all my comments are meaningless for the actual problem.

WadeBarnes commented 1 year ago

The Allure workflows are used to upload the test results to the Allure servers:

swcurran commented 1 year ago

The place to look is here: https://aries-interop.info for a summary of the tests and links to Allure.

Based on every second day runs, I update that page — except when the results suddenly change, as happened in the last few days.

If you then navigate into the per-framework page (e.g. clicking on ACA-Py on the main page), you can see the per runset results, and from their navigate to allure to see the results from the last 10 runs.

You’ll see, for example, that the ACA-Py to ACA-Py (runset “acapy-aip10”) suddenly started getting failures two test runs ago. Those are the ones I’m interested in — why are the ACA-Py to ACA-Py tests failing? I’ve just tried to run locally the “main” and “0.8.2-rc2” branches and they fail the same way. Trying 0.8.1 as I type. I was sure it was going to be one of the two most recent merges, but evidently not…

About .NET — yes, it has been failing for some time. I’m planning on seeing if we can drop it entirely from AATH.

swcurran commented 1 year ago

Interesting…0.8.1 is passing, 0.8.2-rc2 has the failures. Sigh. I’ll try to narrow it to a merge.

usingtechnology commented 1 year ago

thanks @WadeBarnes and @swcurran for the context, that https://aries-interop.info/ is super helpful with understanding what is going on.

swcurran commented 1 year ago

OK — after some messing around with Docker, I’ve confirmed that #2261 is the change that broke the tests. Doesn’t mean that it is wrong — it could be in the Backchannel.

Process I used was to:

Update the requirements-main.yml file to be the particular branch or commit of interest.
Remove the backchannel in docker: docker image rm acapy-main-agent-backchannel:latest
Run the runset ./manage runset acapy-aip10 -r (-r is rebuild)
- For a single test that passes/fails depending on the branch/tag/commit:
  - ./manage build -a acapy-main; ./manage run -d acapy-main -t @T006-RFC0037 -t @AIP10 -t @minor -t @AcceptanceTest -t @Schema_Health_ID -t @Indy -t @ProofProposal
Check the errors:
- Errors happening on aries-cloudagent-python@main and commit aries-cloudagent-python@88769c9a3e6044ca4b22f08d83520f1553c2f97e
- Errors not happening on aries-cloudagent-python@0.8.2-rc0

@usingtechnology — can you please take a look? FYI — with AATH, the logs are at .logs.

I’m sure there are ways to debug, but I don’t know them...

usingtechnology commented 1 year ago

ok, thanks for the process. i'll dig in.

swcurran commented 1 year ago

Scanning the logs that are passing and failing, and I’m not seeing anything. My bet is that the backchannel is expecting the empty ~thread item, but I can’t see it in the logs :-). I assume it is Bob that would be having the problem, but who knows :-).

usingtechnology commented 1 year ago

@swcurran - do you have a set of tests that I can run to hit all the other failures? running it wide open takes too long, and there do appear to be irrelevant failures.

swcurran commented 1 year ago

If you run ./manage runset acapy-aip10, all the tests should pass — they had been before this change. Takes a long time, but you can do other things, hopefully (that’s why I have two machines :-) ).

With runset, you can add a -b build or -r rebuild to the end of the command.

usingtechnology commented 1 year ago

Thanks for the runset information. I have added a PR to AATH.

Fix was very simple but time-consuming to track down and regression test. But I know a lot more about AATH now!

swcurran commented 1 year ago

Nice work! Closing this. Thanks.