redhat-openshift-ecosystem / openshift-preflight

Operator and container preflight certification tests
Apache License 2.0
60 stars 65 forks source link

Random failure of ScorecardBasicSpecCheck test #329

Closed tkrishtop closed 2 years ago

tkrishtop commented 3 years ago

Bug Description

Sometimes ScorecardBasicSpecCheck randomly fails, in ~30% of tests. This fail seems to be preflight-related because operator-sdk scorecard --selector=test=basic-check-spec-test which runs just after always showing "pass". The error on the preflight side is error running tests context deadline exceeded, so maybe it's something timeout-related.

results.json:

    "errors": [
      {
        "name": "ScorecardBasicSpecCheck",
        "elapsed_time": 30130,
        "description": "Check to make sure that all CRs have a spec block.",
        "help": "Check ScorecardBasicSpecCheck encountered an error. Please review the artifacts/operator_bundle_scorecard_BasicSpecCheck.json file for more information."
      },

File artifacts/operator_bundle_scorecard_BasicSpecCheck.json is absent (issue #169 ).

preflight.log:

time="2021-11-04T10:24:50Z" level=info msg="running check: ScorecardBasicSpecCheck"
time="2021-11-04T10:24:50Z" level=debug msg="Running operator-sdk scorecard check for quay.io/rh-nfv-int/testpmd-operator-bundle@sha256:5e28f883faacefa847104ebba1a1a22ee897b7576f0af6b8253c68b5c8f42815"
time="2021-11-04T10:24:50Z" level=debug msg="--selector=[test=basic-check-spec-test]"
time="2021-11-04T10:24:50Z" level=trace msg="running scorecard with the following invocation[operator-sdk scorecard --output json --selector=test=basic-check-spec-test --kubeconfig /kubeconfig --namespace preflight-testing --service-account default --config /tmp/scorecard-test-config-1574852032.yaml --verbose /tmp/preflight-4104845048/fs]"
time="2021-11-04T10:25:20Z" level=error msg="stdout: "
time="2021-11-04T10:25:20Z" level=error msg="stderr: time=\"2021-11-04T10:24:50Z\" level=debug msg=\"Debug logging is set\"\nError: error running tests context deadline exceeded\nUsage:\n  operator-sdk scorecard [flags]\n\nFlags:\n  -c, --config string            path to scorecard config file\n  -h, --help                     help for scorecard\n      --kubeconfig string        kubeconfig path\n  -L, --list                     Option to enable listing which tests are run\n  -n, --namespace string         namespace to run the test images in\n  -o, --output string            Output format for results. Valid values: text, json, xunit (default \"text\")\n  -l, --selector string          label selector to determine which tests are run\n  -s, --service-account string   Service account to use for tests (default \"default\")\n  -x, --skip-cleanup             Disable resource cleanup after tests are run\n  -b, --storage-image string     Storage image to be used by the Scorecard pod (default \"docker.io/library/busybox@sha256:c71cb4f7e8ececaffb34037c2637dc86820e4185100e18b4d02d613a9bd772af\")\n  -t, --test-output string       Test output directory. (default \"test-output\")\n  -u, --untar-image string       Untar image to be used by the Scorecard pod (default \"registry.access.redhat.com/ubi8@sha256:910f6bc0b5ae9b555eb91b88d28d568099b060088616eba2867b07ab6ea457c7\")\n  -w, --wait-time duration       seconds to wait for tests to complete. Example: 35s (default 30s)\n\nGlobal Flags:\n      --plugins strings   plugin keys to be used for this subcommand execution\n      --verbose           Enable verbose logging\n\ntime=\"2021-11-04T10:25:20Z\" level=fatal msg=\"error running tests context deadline exceeded\"\n"
time="2021-11-04T10:25:20Z" level=info msg="check completed: ScorecardBasicSpecCheck" ERROR="failed to run operator-sdk scorecard: exit status 1" result="failed to run operator-sdk scorecard: exit status 1"

operator-sdk scorecard output:

{
  "kind": "TestList",
  "apiVersion": "scorecard.operatorframework.io/v1alpha3",
  "items": [
    {
      "kind": "Test",
      "apiVersion": "scorecard.operatorframework.io/v1alpha3",
      "spec": {
        "image": "quay.io/operator-framework/scorecard-test@sha256:66023c1e0f6021407811fb2b1d42aa16af6f41f2c791b2bdeb6ba696d000cf5d",
        "entrypoint": [
          "scorecard-test",
          "basic-check-spec"
        ],
        "labels": {
          "suite": "basic",
          "test": "basic-check-spec-test"
        },
        "storage": {
          "spec": {
            "mountPath": {}
          }
        }
      },
      "status": {
        "results": [
          {
            "name": "basic-check-spec",
            "state": "pass",
            "creationTimestamp": null
          }
        ]
      }
    }
  ]
}

Version and Command Invocation

1.0.3

Steps to Reproduce:

Run preflight operator check for testpmd operator several times in a row:

image: 'quay.io/rh-nfv-int/testpmd-operator-bundle@sha256:5e28f883faacefa847104ebba1a1a22ee897b7576f0af6b8253c68b5c8f42815',
index_image: 'quay.io/tkrishtop/index-testpmd-operator-bundle:v0.2.9'

Sometimes ScorecardBasicSpecCheck fails (~30%)

and sometimes pass (~70%)

Expected Result

ScorecardBasicSpecCheck should always pass

Actual Result

Sometimes ScorecardBasicSpecCheck fails (~30%)

and sometimes pass (~70%)

tkrishtop commented 2 years ago

Reproduced this issue again with the latest tag b00d46ad3531b3059a2e1ffda3fa3531a669f0ec (=master branch).

Operator version:

image: "quay.io/rh-nfv-int/testpmd-operator-bundle@sha256:5e28f883faacefa847104ebba1a1a22ee897b7576f0af6b8253c68b5c8f42815"
index_image: "quay.io/tkrishtop/index-testpmd-operator-bundle:v0.2.9"

Here are all the logs.

Preflight shows ScorecardOlmSuiteCheck test in errors

    "errors": [
      {
        "name": "ScorecardOlmSuiteCheck",
        "elapsed_time": 120145,
        "description": "Operator-sdk scorecard OLM Test Suite Check",
        "help": "Check ScorecardOlmSuiteCheck encountered an error. Please review the /artifacts/operator_bundle_scorecard_OlmSuiteCheck.json file for more information."
      },

but independent operator-sdk scorecard --selector=test=basic-check-spec-test test (I run it right after the preflight test) for the same operator shows that the test is OK

{
  "kind": "TestList",
  "apiVersion": "scorecard.operatorframework.io/v1alpha3",
  "items": [
    {
      "kind": "Test",
      "apiVersion": "scorecard.operatorframework.io/v1alpha3",
      "spec": {
        "image": "quay.io/operator-framework/scorecard-test@sha256:ff3ce0785e706185260a8308c0f40bae950fdba03cbb302fd002dddf3129b189",
        "entrypoint": [
          "scorecard-test",
          "basic-check-spec"
        ],
        "labels": {
          "suite": "basic",
          "test": "basic-check-spec-test"
        },
        "storage": {
          "spec": {
            "mountPath": {}
          }
        }
      },
      "status": {
        "results": [
          {
            "name": "basic-check-spec",
            "state": "pass",
            "creationTimestamp": null
          }
        ]
      }
    }
  ]
}
samira-barouti commented 2 years ago

@tkrishtop From the log, it seems that the scorecard test was not executed within the 120s wait time (elapsed time 120145ms). We merged a PR that increases the timeout to 240ms. Please feel free to reopen the issue if you saw the error again.