preflight fails on "operator-sdk scorecard: unexpected end of JSON input"

jbattiato commented 3 weeks ago

Bug Description

The preflight check operator command fails to read the scorecard output with the following error:

time="2024-10-02T15:57:04+02:00" level=info msg="running scorecard with the following invocation" args="[\"operator-sdk\",\"scorecard\",\"--output\",\"json\",\"--selector=test=basic-check-spec-test\",\"--kubeconfig\",\"/tmp/2804899030\",\"--wait-time\",\"240s\",\"--namespace\",\"default\",\"--service-account\",\"default\",\"--config\",\"/tmp/scorecard-test-config-798981960.yaml\",\"--verbose\",\"/tmp/preflight-3318550770/fs\"]"
time="2024-10-02T15:57:04+02:00" level=info msg="check completed" check=ScorecardBasicSpecCheck err="failed to run operator-sdk scorecard: unexpected end of JSON input" result=ERROR

Version and Command Invocation

1.10.0

ime="2024-10-02T12:24:50Z" level=info msg="certification library version" version="1.10.0 <commit: c9048da99aae76ddee5a708edcc94e14c034cd1d>"

Steps to Reproduce:

PFLT_INDEXIMAGE=<index-image> PFLT_ARTIFACTS=preflight_results  KUBECONFIG=${HOME}/.kube/config \
  ./preflight check operator <bundle-image>  --docker-config ${HOME}/.docker/config.json

I'm sorry, I'm working on a closed fork of cloudnative-pg, and I can't share the associated workflow output.

Expected Result

preflight should be able to read the scorecard JSON output since the latter is well formed when running it standalone:

./operator-sdk scorecard <bundle-image> --wait-time 60s --output json > scorecard.json
jq -e . >/dev/null 2>&1 <<<$(cat scorecard.json)
echo $?
0

Actual Result

These errors from ScorecardBasicSpecCheck and ScorecardOlmSuiteCheck:

level=info msg="check completed" check=ScorecardBasicSpecCheck err="failed to run operator-sdk scorecard: unexpected end of JSON input" result=ERROR

level=info msg="check completed" check=ScorecardOlmSuiteCheck err="failed to run operator-sdk scorecard: unexpected end of JSON input" result=ERROR

Additional Context

=== EDIT ===

OCP 4.17
Scorecard config

===========

The error itself is not very informative, and I couldn't find any clues on the root cause behind the preflight fail.

https://github.com/redhat-openshift-ecosystem/openshift-preflight/blob/c9048da99aae76ddee5a708edcc94e14c034cd1d/internal/operatorsdk/operatorsdk.go#L112

I looked up the "failed to run operator-sdk scorecard: unexpected end of JSON input" error in the issues and found this: https://github.com/redhat-openshift-ecosystem/openshift-preflight/issues/355

I might be wrong, but it does not feel to be related.

Can you help me investigating this issue?

acornett21 commented 3 weeks ago

Hi @jbattiato What version of operator sdk are you using? Can you share the scorecard directory? There really isn't anything in this issue where we could replicate/troubleshoot this.

jbattiato commented 3 weeks ago

Hi @jbattiato What version of operator sdk are you using? Can you share the scorecard directory? There really isn't anything in this issue where we could replicate/troubleshoot this.

I'll try to collect and share to you as much as possible. Thanks for looking into this already!

acornett21 commented 3 weeks ago

If you can at least share the scorecard config.yaml I can plug that into a public bundle and replicate. For clarity I'm talking about this file.

https://github.com/redhat-openshift-ecosystem/certified-operators/blob/main/operators/cloudnative-pg/1.24.0/tests/scorecard/config.yaml

jbattiato commented 3 weeks ago

This preflight check command uses the open source image:

PFLT_ARTIFACTS=preflight_operator_results \
  BUNDLE_IMG=ghcr.io/cloudnative-pg/cloudnative-pg-testing:bundle-cnp-5405 \
  PFLT_INDEXIMAGE=ghcr.io/cloudnative-pg/cloudnative-pg-testing:index-cnp-5405 \
  KUBECONFIG=~/.kube/config \
  bin/preflight check operator ghcr.io/cloudnative-pg/cloudnative-pg-testing:bundle-cnp-5405  --loglevel trace

The scorecard will be shown in the preflight output.

scorecard config:

apiVersion: scorecard.operatorframework.io/v1alpha3
kind: Configuration
metadata:
  name: config
stages:
- parallel: true
  tests:
  - entrypoint:
    - scorecard-test
    - basic-check-spec
    image: quay.io/operator-framework/scorecard-test:v1.37.0
    labels:
      suite: basic
      test: basic-check-spec-test
  - entrypoint:
    - scorecard-test
    - olm-bundle-validation
    image: quay.io/operator-framework/scorecard-test:v1.37.0
    labels:
      suite: olm
      test: olm-bundle-validation-test
  - entrypoint:
    - scorecard-test
    - olm-crds-have-validation
    image: quay.io/operator-framework/scorecard-test:v1.37.0
    labels:
      suite: olm
      test: olm-crds-have-validation-test
  - entrypoint:
    - scorecard-test
    - olm-crds-have-resources
    image: quay.io/operator-framework/scorecard-test:v1.37.0
    labels:
      suite: olm
      test: olm-crds-have-resources-test
  - entrypoint:
    - scorecard-test
    - olm-spec-descriptors
    image: quay.io/operator-framework/scorecard-test:v1.37.0
    labels:
      suite: olm
      test: olm-spec-descriptors-test
  - entrypoint:
    - scorecard-test
    - olm-status-descriptors
    image: quay.io/operator-framework/scorecard-test:v1.37.0
    labels:
      suite: olm
      test: olm-status-descriptors-test

acornett21 commented 3 weeks ago

Hi @jbattiato With version 1.10.0 on an OCP 4.16 cluster, the index image and bundle image pass preflight for me. What version of OCP are you testing on?

jbattiato commented 3 weeks ago

Hi @jbattiato With version 1.10.0 on an OCP 4.16 cluster, the index image and bundle image pass preflight for me. What version of OCP are you testing on?

We are using OCP 4.17. (I'm updating the Context issue with these details)

acornett21 commented 3 weeks ago

@jbattiato Are you using an RC version of OCP? Or the 4.17.0 GA versions? I cannot replicate this on 4.17.0 GA versions.

acornett21 commented 3 weeks ago

Hi @jbattiato this seems to be a host setup issue where you are running the tooling. operator-sdk needs to be in $PATH environment. which operator-sdk should produce a valid result. Let me know if you have any questions.

jbattiato commented 3 weeks ago

@jbattiato Are you using an RC version of OCP? Or the 4.17.0 GA versions? I cannot replicate this on 4.17.0 GA versions.

We used the 4.17 GA.

Hi @jbattiato this seems to be a host setup issue where you are running the tooling. operator-sdk needs to be in $PATH environment. which operator-sdk should produce a valid result. Let me know if you have any questions.

I'll double check this. Many thanks for the suggestion!

sxd commented 3 weeks ago

This is interesting @acornett21 it is possible to specify the PATH of the operator-sdk to be used by preflight ? it will be very helpful in cases when you one to use a specific version (our dev case) of operator-sdk and not the one globally deployed Thanks for the reply!

acornett21 commented 3 weeks ago

@sxd Since preflight has to shell out to another binary, I'm not sure if specifying a custom path is possible, but even if it is, I'm not sure this is something that we'd want to implement. We already state in our requirements the version of operator-sdk that we support, and that it must be in the path.

If you want to have a different version of this on you system and not in the PATH, it might be best to run preflight as a container, so preflight has access to the operator-sdk dependency baked into the container. Here is the recipe for that.

sxd commented 3 weeks ago

@acornett21 that's actually probably the best solution, you're totally right, adding to PATH the directory with the binary makes a lot of sense when running!

Thank you as usual you're very kind!

acornett21 commented 3 weeks ago

Another option, is to put both binaries in the same directory and export PATH to equal that directory temporarily.

jbattiato commented 3 weeks ago

Hi @acornett21!

I'd like to thank you for your hints and promptly feedback on this issue! Your suggestions have fixed the scorecard issue and the preflight check is now proceeding!!

Many thanks again! We can close this.

Have a great weekend!

acornett21 commented 3 weeks ago

HI @jbattiato no problem, happy to help, this was a weird one. I'd like to provide a better message for this PATH issue, but mocking to enable tests to pass will be an issue. I'll like back to this issue if we can come up with something.

redhat-openshift-ecosystem / openshift-preflight