openshift-pipelines / pipeline-service

SaaS for Tekton Pipelines
Apache License 2.0
20 stars 43 forks source link

Enable deep-inspection for Chains in tekton config #985

Closed yashvardhannanavati closed 2 months ago

yashvardhannanavati commented 3 months ago

This config change enables Chains to inspect task level results making sure the index manifests for an image index get attested too and not just the image index iself.

Resolved CVP-4070

yashvardhannanavati commented 3 months ago

@arewm @enarha @lcarva Could you please take a look?

enarha commented 3 months ago

/ok-to-test

enarha commented 3 months ago

/test test-pipeline-service-deployment-ocp-414

enarha commented 3 months ago

/test test-pipeline-service-deployment-ocp-414

gabemontero commented 3 months ago

@yashvardhannanavati @arewm @lcarva

turns out our smoke test here is failing in our chains portion of the smoke test, which means this config change has some adverse affect

Unfortunately, the attempts at getting debug data has a bug in it.

Here is the log snippet pulled from the results db (the test pipeline in tekton-ci has long since been pruned before I had time to take a peek, and where @enarha did not get a chance to look for you.

Here is log output:

[test-pipeline-service : run-tests] [chains]
[test-pipeline-service : run-tests] + echo OK
[test-pipeline-service : run-tests] + echo
[test-pipeline-service : run-tests] + for case in "${TEST_LIST[@]}"
[test-pipeline-service : run-tests] + case $case in
[test-pipeline-service : run-tests] + echo '[chains]'
[test-pipeline-service : run-tests] + test_chains
[test-pipeline-service : run-tests] + kubectl apply -k /workspace/source/operator/test/manifests/test/tekton-chains -n plnsvc-tests
[test-pipeline-service : run-tests] serviceaccount/chains-test created
[test-pipeline-service : run-tests] rolebinding.rbac.authorization.k8s.io/chains-test-edit-rolebinding created
[test-pipeline-service : run-tests] rolebinding.rbac.authorization.k8s.io/chains-test-scc-rolebinding created
[test-pipeline-service : run-tests] + kubectl get pipelines -n plnsvc-tests -o name
[test-pipeline-service : run-tests] + grep -q pipeline.tekton.dev/simple-copy
[test-pipeline-service : run-tests] pipeline.tekton.dev/simple-copy created
[test-pipeline-service : run-tests] + echo -n '  - Signing secret: '
[test-pipeline-service : run-tests] + kubectl get secret signing-secrets -n openshift-pipelines
[test-pipeline-service : run-tests]   - Signing secret: + echo OK
[test-pipeline-service : run-tests] + echo -n '  - Run pipeline: '
[test-pipeline-service : run-tests] + image_src=quay.io/aptible/alpine:latest
[test-pipeline-service : run-tests] ++ basename quay.io/aptible/alpine:latest
[test-pipeline-service : run-tests] OK
[test-pipeline-service : run-tests]   - Run pipeline: + image_name=alpine:latest
[test-pipeline-service : run-tests] + image_dst=image-registry.openshift-image-registry.svc:5000/plnsvc-tests/alpine:latest
[test-pipeline-service : run-tests] ++ tkn -n plnsvc-tests pipeline start simple-copy --param image-src=quay.io/aptible/alpine:latest --param image-dst=image-registry.openshift-image-registry.svc:5000/plnsvc-tests/alpine:latest --serviceaccount chains-test --workspace name=shared,pvc,claimName=tekton-build
[test-pipeline-service : run-tests] ++ head -1
[test-pipeline-service : run-tests] ++ sed 's:.* ::'
[test-pipeline-service : run-tests] + pipeline_name=simple-copy-run-fdp8l
[test-pipeline-service : run-tests] + wait_for_pipeline pipelineruns/simple-copy-run-fdp8l plnsvc-tests
[test-pipeline-service : run-tests] + kubectl wait --for=condition=succeeded pipelineruns/simple-copy-run-fdp8l -n plnsvc-tests --timeout 300s
[test-pipeline-service : run-tests] error: timed out waiting for the condition on pipelineruns/simple-copy-run-fdp8l
[test-pipeline-service : run-tests] + echo '[ERROR] Pipeline failed to complete successful'
[test-pipeline-service : run-tests] [ERROR] Pipeline failed to complete successful
[test-pipeline-service : run-tests] + kubectl get pipelineruns pipelineruns/simple-copy-run-fdp8l -n plnsvc-tests
[test-pipeline-service : run-tests] error: there is no need to specify a resource type as a separate argument when passing arguments in resource/name form (e.g. 'kubectl get resource/<resource_name>' instead of 'kubectl get resource resource/<resource_name>'
[test-pipeline-service : run-tests] command terminated with exit code 1
[test-pipeline-service : run-tests] + exit 1

I will in a few minutes try to create a separate PR which gets the state and logs for the test chains/signing pipeline run (simple-copy-run-fdp8l in the log output above.)

Once I get that in, we'll rebase / retest your PR here, and hopefully you all will be able to discern why the test is failing.

For reference, the chains test starts at https://github.com/openshift-pipelines/pipeline-service/blob/main/operator/test/test.sh#L150

gabemontero commented 3 months ago

debug fix ^^

gabemontero commented 3 months ago

FYI @yashvardhannanavati the various metadata for the test started at https://github.com/openshift-pipelines/pipeline-service/blob/main/operator/test/test.sh#L150 is over in https://github.com/openshift-pipelines/pipeline-service/tree/main/operator/test/manifests/test/tekton-chains

gabemontero commented 3 months ago

also, we'll want to see passing app studio e2e's from https://github.com/redhat-appstudio/infra-deployments/pull/3543 before merging this PR here

gabemontero commented 3 months ago

OK we have some basic debug @yashvardhannanavati ... I'll attach the log here.

I got it by watching the test-pipeline-service-deployment-ocp-414 pipelinerun over at https://console-openshift-console.apps.stone-prd-rh01.pg1f.p1.openshiftapps.com/pipelines/ns/tekton-ci/pipeline-runs

I'm assuming this will make sense to you all more than me, but with this config change, the test pipeline step at https://github.com/openshift-pipelines/pipeline-service/blob/main/operator/test/manifests/test/tekton-chains/simple-copy-pipeline.yaml#L49-L73 fails to create the pod.

[ERROR] Pipeline failed to complete successful
+ kubectl get pipelineruns/simple-copy-run-9vjp6 -n plnsvc-tests
+ echo /dev/stdout
+ prefix=pipelineruns/
NAME                    SUCCEEDED   REASON   STARTTIME   COMPLETIONTIME
simple-copy-run-9vjp6   False       Failed   5m          4m46s
/dev/stdout
+ string_to_prune=pipelineruns/simple-copy-run-9vjp6
+ tkn_param=simple-copy-run-9vjp6
+ tkn pr logs simple-copy-run-9vjp6 -n plnsvc-tests
task image-copy has failed: Failed to create pod due to config error
/dev/stdout
Tasks Completed: 1 (Failed: 1, Cancelled 0), Skipped: 0

does this part of the test:

echo "Bypass image push temporarily"
 echo "foobar"  > "$(results.IMAGE_URL.path)"
 echo "foobar"  > "$(results.IMAGE_DIGEST.path)"

violate the deep inspection? If so, then yeah you will need to update the test and/or the sample pipelinerun etc. so that we can properly pass the tests at https://github.com/openshift-pipelines/pipeline-service/blob/main/operator/test/test.sh#L189-L264

If you need more debug, you can add kubectl calls to https://github.com/openshift-pipelines/pipeline-service/blob/main/operator/test/test.sh#L89-L103 as part of your PR here to get events from the test namespace or the chains controller logs etc when the CI runs again.

Or, if debugging through our CI proves to onerous, you bring up your own OCP cluster and deploy your branch to it. The instructions for that are

gabemontero commented 3 months ago

full test logs test-pipeline-service.log

enarha commented 3 months ago

/test test-pipeline-service-deployment-ocp-414

enarha commented 3 months ago

I'm trying to manually verify the change and while the upgrade test here succeeds, deploying the change over the baseline, in ArgoCD I get

one or more objects failed to apply, reason: error when patching "/dev/shm/1568775355": admission webhook "webhook.operator.tekton.dev" denied the request: mutation failed: cannot decode incoming new object: json: cannot unmarshal string into Go struct field Chain.spec.chain.artifacts.pipelinerun.enable-deep-inspection of type bool. Retrying attempt #6 at 1:01PM.

Changing the value to true (no quotes), make the deployment succeed and the tests pass as well. I remember that value originally was set as bool and later changed to string. Why?

gabemontero commented 3 months ago

There was and issue with using true vs “true” for the field that the operator webhook complains about

Long term the operator should be more flexible

Short term,

Look for my pr in infra deployment that sets the field with true but not in quotes

On Sun, Apr 14, 2024 at 9:01 AM Emil Natan @.***> wrote:

I'm trying to manually verify the change and while the upgrade test here succeeds, deploying the change over the baseline, in ArgoCD I get

one or more objects failed to apply, reason: error when patching "/dev/shm/1568775355": admission webhook "webhook.operator.tekton.dev" denied the request: mutation failed: cannot decode incoming new object: json: cannot unmarshal string into Go struct field Chain.spec.chain.artifacts.pipelinerun.enable-deep-inspection of type bool. Retrying attempt #6 at 1:01PM.

— Reply to this email directly, view it on GitHub https://github.com/openshift-pipelines/pipeline-service/pull/985#issuecomment-2054054826, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA3NU5FNM7Q4PRGTRZJDQXTY5J43PAVCNFSM6AAAAABFVQYBO2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANJUGA2TIOBSGY . You are receiving this because you commented.Message ID: @.***>

gabemontero commented 2 months ago

this was handled in https://github.com/redhat-appstudio/infra-deployments/pull/3580

closing