[Enhancement] Break execution on exception in coverage script and return non-zero status code

neomatrix369 commented 5 years ago

On the back of this conversation - we would like to correct the behaviour of the runner scripts to:

Regarding other missing dependencies, we should not return 0 as the coverage as this is a valid, normal output and we would not pick this up as an error. In the case of the current issue, the report was sent to the client with 0 coverage and it was misleading. The correct behaviour is to throw an error and return a non-zero status code. This will be then picked up by the AWS monitoring

For clarity, when coverage script throws an exception, it should be reported as an error and non-zero exit code be returned. Each runner dependents one or more fixed dependencies to generate a coverage report, in case of it failing to run this report - report error and return non-zero exit code.

julianghionoiu commented 5 years ago

Just to be clear, the only failures that can be ignored are test failures. Everything else should be treated as an error. The coverage script should only return a value if it was able to compute it otherwise, no value and a non-zero exit code.

neomatrix369 commented 5 years ago

Currently, if these occurs on any of the tdl-runner-xxx programs:

[snipped]
+++ set -- -Dorg.gradle.appname=get_coverage_for_challenge.sh -classpath /srv/tdl-runner/gradle/wrapper/gradle-wrapper.jar org.gradle.wrapper.GradleWrapperMain -p /srv/tdl-runner -q clean test jacocoTestReport --console=plain
+++ uname
++ '[' Linux = Darwin ']'
++ exec /usr/lib/jvm/java-8-openjdk-amd64/bin/java -Dorg.gradle.appname=get_coverage_for_challenge.sh -classpath /srv/tdl-runner/gradle/wrapper/gradle-wrapper.jar org.gradle.wrapper.GradleWrapperMain -p /srv/tdl-runner -q clean test jacocoTestReport --console=plain

FAILURE: Build failed with an exception.

* Where:
Build file '/srv/tdl-runner/build.gradle' line: 2

* What went wrong:
A problem occurred evaluating root project 'tdl-runner'.
> Plugin with id 'com.bmuschko.clover' not found.

* Try:
Run with --stacktrace option to get the stack trace. Run with --info or --debug option to get more log output.

BUILD FAILED in 2s

What alert is generated by the AWS system? The exit code returned by the above is 1 and not 0 like the other messages. This one above is caused if the coverage tool isn't able to run due to some changes made to the build script.

neomatrix369 commented 5 years ago

And another one,

[snipped]
files.
CopyFilesToOutputDirectory:
  BeFaster.App.Tests -> /srv/tdl-runner/src/BeFaster.App.Tests/bin/Debug/BeFaster.App.Tests.dll
Done Building Project "/srv/tdl-runner/src/BeFaster.App.Tests/BeFaster.App.Tests.csproj" (default targets).
Done Building Project "/srv/tdl-runner/befaster.sln" (default targets).

Build succeeded.
    0 Warning(s)
    0 Error(s)

Time Elapsed 00:00:07.15
+ '[' -e /srv/tdl-runner/__Instrumented ']'
+ '[' -e /srv/tdl-runner/__UnitTestWithAltCover ']'
+ rm -fr /srv/tdl-runner/__UnitTestWithAltCover
++ cd /srv/tdl-runner
++ head -n 1
++ find . -path '*altcover*'
+ FULL_PATH_TO_ALTCOVER=/tools/net45/AltCover.exe
+ cd /srv/tdl-runner
+ mono /srv/tdl-runner//tools/net45/AltCover.exe --opencover --linecover --inputDirectory /srv/tdl-runner/src/BeFaster.App.Tests/bin/Debug --assemblyFilter=Adapter --assemblyFilter=Mono --assemblyFilter=.Recorder --assemblyFilter=Sample --assemblyFilter=nunit --assemblyFilter=Tests --assemblyExcludeFilter=.+.Tests --assemblyExcludeFilter=AltCover.+ --assemblyExcludeFilter=Mono.DllMap.+ --typeFilter=System. --outputDirectory=/srv/tdl-runner/__UnitTestWithAltCover --xmlReport=/srv/tdl-runner/coverage/instrumented-coverage.xml
Cannot open assembly '/srv/tdl-runner//tools/net45/AltCover.exe': No such file or directory.

julianghionoiu commented 5 years ago

AWS propagates the exitCode. We should be able to use this information in the tdl-infra-event project. Here is an AWS event for a successful coverage execution:

"0"
"4ad932d3-a854-73d6-7f46-62dcd462b07d"
"ECS Task State Change"
"aws.ecs"
"577770582757"
"2019-01-25T18:12:37Z"
"eu-west-1"
[
  "arn:aws:ecs:eu-west-1:577770582757:task/94024756-d303-4ce2-abd5-e5044084b421"
]
{
  "clusterArn": "arn:aws:ecs:eu-west-1:577770582757:cluster/dpnt-coverage-live",
  "containers": [
    {
      "containerArn": "arn:aws:ecs:eu-west-1:577770582757:container/7ab9665b-754f-4dc3-b4e8-b44b252ae219",
      "exitCode": 0,
      "lastStatus": "STOPPED",
      "name": "default-container",
      "taskArn": "arn:aws:ecs:eu-west-1:577770582757:task/94024756-d303-4ce2-abd5-e5044084b421",
      "networkInterfaces": [
        {
          "attachmentId": "78b83c4b-30c9-492f-9441-a0eb5215815c",
          "privateIpv4Address": "172.31.16.66"
        }
      ],
      "cpu": "0"
    }
  ],
  "createdAt": "2019-01-25T18:10:55.692Z",
  "launchType": "FARGATE",
  "cpu": "1024",
  "memory": "2048",
  "desiredStatus": "STOPPED",
  "group": "family:dpnt-coverage-live-python",
  "lastStatus": "STOPPED",
  "overrides": {
    "containerOverrides": [
      {
        "environment": [
          {
            "name": "PARTICIPANT_ID",
            "value": "wyuk01"
          },
          {
            "name": "REPO",
            "value": "s3://tdl-official-videos/CHK/wyuk01/sourcecode_20190125T144722.srcs"
          },
          {
            "name": "CHALLENGE_ID",
            "value": "CHK"
          },
          {
            "name": "ROUND_ID",
            "value": "CHK_R4"
          },
          {
            "name": "TAG",
            "value": "CHK_R4/done"
          }
        ],
        "name": "default-container"
      }
    ]
  },
  "attachments": [
    {
      "id": "78b83c4b-30c9-492f-9441-a0eb5215815c",
      "type": "eni",
      "status": "DELETED",
      "details": [
        {
          "name": "subnetId",
          "value": "subnet-cf2386b9"
        },
        {
          "name": "networkInterfaceId",
          "value": "eni-c7f52af1"
        },
        {
          "name": "macAddress",
          "value": "06:a6:0b:64:1b:08"
        },
        {
          "name": "privateIPv4Address",
          "value": "172.31.16.66"
        }
      ]
    }
  ],
  "connectivity": "CONNECTED",
  "connectivityAt": "2019-01-25T18:11:00.29Z",
  "pullStartedAt": "2019-01-25T18:11:08.003Z",
  "startedAt": "2019-01-25T18:12:05.003Z",
  "stoppingAt": "2019-01-25T18:12:24.624Z",
  "stoppedAt": "2019-01-25T18:12:37.658Z",
  "pullStoppedAt": "2019-01-25T18:11:58.003Z",
  "executionStoppedAt": "2019-01-25T18:12:24Z",
  "stoppedReason": "Essential container in task exited",
  "stopCode": "EssentialContainerExited",
  "updatedAt": "2019-01-25T18:12:37.658Z",
  "taskArn": "arn:aws:ecs:eu-west-1:577770582757:task/94024756-d303-4ce2-abd5-e5044084b421",
  "taskDefinitionArn": "arn:aws:ecs:eu-west-1:577770582757:task-definition/dpnt-coverage-live-python:1",
  "version": 5,
  "platformVersion": "1.3.0"
}

neomatrix369 commented 5 years ago

Thanks for that, its still a lot of data to understand which part is useful for the dpnt-infra-events project.

neomatrix369 commented 5 years ago

On a different note, I was thinking of added one or two tests to the dpnt-coverage projects - I feel over time we need to keep track of our scripts, if they are breaking for the right reasons or not. Any changes that prevent this from happening should also be caught before it goes into production.

For such tests, I would need to start with a modified build script, and I plan to use tags to store such changes - so when we run the test we can say:

computeCoverageForChallenge "[language]" "[tag]" "[any challenge]" "non-zero exit code"

or

checkForFailingCoverageResults "[language]" "[tag]" "non-zero exit code"    <== new testing framework function

Which results in the testing framework checking the return code and/or the error log and verifying that the execution has failed for the right reason.

julianghionoiu commented 5 years ago

Hey Mani,

Are you thinking of adding a negative test to the dpnt-coverage project? Sounds good to me. On Sat, 26 Jan 2019 at 18:25, mani notifications@github.com wrote:

On a different note, I was thinking of added one or two tests to the dpnt-coverage projects - I feel over time we need to keep track of our scripts, if they are breaking for the right reasons or not. Any changes that prevent this from happening should also be caught before it goes into production.

— You are receiving this because you modified the open/close state.

Reply to this email directly, view it on GitHub https://github.com/julianghionoiu/tdl-lord-of-runners/issues/15#issuecomment-457844334, or mute the thread https://github.com/notifications/unsubscribe-auth/AE2qKawKT46YgtOtMbDVLy1pEBXXuPGbks5vHIF1gaJpZM4aMhDS .

neomatrix369 commented 5 years ago

On a different note, I was thinking of added one or two tests to the dpnt-coverage projects - I feel over time we need to keep track of our scripts, if they are breaking for the right reasons or not. Any changes that prevent this from happening should also be caught before it goes into production.

For such tests, I would need to start with a modified build script, and I plan to use tags to store such changes - so when we run the test we can say:
computeCoverageForChallenge "[language]" "[tag]" "[any challenge]" "non-zero exit code"
or
checkForFailingCoverageResults "[language]" "[tag]" "non-zero exit code"    <== new testing framework function
Which results in the testing framework checking the return code and/or the error log and verifying that the execution has failed for the right reason.

Won't need to change tdl-runners-xxxx, we can solve it from the world of docker!

neomatrix369 commented 5 years ago

@julianghionoiu if you are happy with the style of the test in https://github.com/julianghionoiu/dpnt-coverage/pull/32 then I'll apply the same for the other languages.

julianghionoiu / tdl-lord-of-runners

[Enhancement] Break execution on exception in coverage script and return non-zero status code #15