ros-navigation / navigation2

ROS 2 Navigation Framework and System
https://nav2.org/
Other
2.58k stars 1.3k forks source link

Failed uploading test results #3437

Closed ruffsl closed 1 year ago

ruffsl commented 1 year ago

Looking at the most recent main\nightly release_test jobs, I've just noticed that the step for Uploading test results is encountering errors itself. For example:

failed uploading test results: File: opt/overlay_ws/test_results/costmap_queue/Testing/20230224-2235/Test.xml had the following problems:
  * invalid top level element: Site
File: opt/overlay_ws/test_results/dwb_core/Testing/20230228-2350/Test.xml had the following problems:
  * invalid top level element: Site
File: opt/overlay_ws/test_results/dwb_critics/Testing/20230228-2351/Test.xml had the following problems:
  * invalid top level element: Site
...

https://app.circleci.com/pipelines/github/ros-planning/navigation2/8897/workflows/75ea297e-613d-4987-b747-e13574e0f44e/jobs/29152/parallel-runs/0/steps/0-114

We can view the xml files for these test results directly from the job's artifacts tab and see that testing framework in indeed using the invalid top level element <Site>:

<Site BuildName="(empty)" BuildStamp="20230224-2235-Experimental" Name="29421f4399be" Generator="ctest-3.22.1" CompilerName="/usr/bin/c++" CompilerVersion="11.3.0" OSName="Linux" Hostname="3341d5d656f0" OSRelease="5.15.0-1030-aws" OSVersion="#34~20.04.1-Ubuntu SMP Tue Jan 24 15:16:46 UTC 2023" OSPlatform="x86_64" Is64Bits="1" VendorString="GenuineIntel" VendorID="Intel Corporation" FamilyID="6" ModelID="85" ProcessorCacheSize="25344" NumberOfLogicalCPU="36" NumberOfPhysicalCPU="18" TotalVirtualMemory="0" TotalPhysicalMemory="70225" LogicalProcessorsPerPhysical="2" ProcessorClockFrequency="3000">
<script/>
<Testing>
<StartDateTime>Feb 24 22:35 UTC</StartDateTime>
<StartTestTime>1677278106</StartTestTime>
<TestList>
<Test>./mbq_test</Test>
<Test>./utest</Test>
<Test>./cppcheck</Test>
<Test>./cpplint</Test>
<Test>./lint_cmake</Test>
<Test>./uncrustify</Test>
<Test>./xmllint</Test>
</TestList>
<Test Status="passed">
...
</Test>
<Test Status="passed">
...
</Test>
<Test Status="passed">
...
</Test>
<Test Status="passed">
...
</Test>
<Test Status="passed">
...
</Test>
<Test Status="passed">
...
</Test>
<Test Status="passed">
...
</Test>
<EndDateTime>Feb 24 22:35 UTC</EndDateTime>
<EndTestTime>1677278107</EndTestTime>
<ElapsedMinutes>0</ElapsedMinutes>
</Testing>
</Site>

File: opt/overlay_ws/test_results/costmap_queue/Testing/20230224-2235/Test.xml

And by invalid: perhaps at least as defined by the JUnit XML test metadata schema that CircleCI uses to parse the test results.

Given the release_test job rarely passes 100%, it's hard to find when this all may have started, as it could have gone unnoticed for a while. While this issue only results in false negatives for otherwise passing CI jobs, given that colcon test results returns an error code anyways for truly failing tests, this issue can still obscure the pass rate statistics of affected tests, or hide test result output of failing tests.

SteveMacenski commented 1 year ago

So the formatting of the autogenerated files from testing are somehow invalid? That seems to me like a colcon or test library issue, not something we did, no?

https://app.circleci.com/pipelines/github/ros-planning/navigation2/8896/workflows/1d0b7a1b-f141-4e58-9f68-bb2badd0e2bd/jobs/29149/steps

This isn't a job that succeeded, there are still failures and the issue with uploading the tests as you describe occurred. I don't think its restricted to release tests passing 100%.

I did notice yesterday that the CodeCov badge said unknown and I updated the codecov badge to use a token (which I suppose is now the recommended format) and was going to check back in on it today to see if things propagated and the test metrics came back to the readme. It is - but you point out now that its not just the end link but something happening internally, I had also just started to notice something odd

I'm not sure what the steps forward are here