Closed gabemontero closed 6 months ago
I think I'm hitting some flakes in the 2 test
Attempting a manual upgrade / downgrade out of https://github.com/gabemontero/pipeline-service/tree/man-upgrade with dev_setup.sh
I've also pushed a fix to log gathering on test failures in this PR, though I forget if changes to the test scripts are picked up in PRs
baseline passed again, another flake in upgrade, but hcp / rosa related again I think @xinredhat
step-run-plnsvc-setup
+ touch /workspace/workdir/source/destroy-cluster.txt
+ echo 'Execute dev_setup.sh script to set up pipeline-service ...'
Execute dev_setup.sh script to set up pipeline-service ...
+ for _ in {1..3}
+ kubectl -n default exec pod/ci-runner -- sh -c '/workspace/sidecar/bin/plnsvc_setup.sh https://github.com/openshift-pipelines/pipeline-service main'
Error from server: error dialing backend: EOF
+ echo 'Failed to execute dev_setup.sh script, retrying ...'
Failed to execute dev_setup.sh script, retrying ...
+ sleep 5
+ for _ in {1..3}
+ kubectl -n default exec pod/ci-runner -- sh -c '/workspace/sidecar/bin/plnsvc_setup.sh https://github.com/openshift-pipelines/pipeline-service main'
Error from server: error dialing backend: EOF
Failed to execute dev_setup.sh script, retrying ...
+ echo 'Failed to execute dev_setup.sh script, retrying ...'
+ sleep 5
+ for _ in {1..3}
+ kubectl -n default exec pod/ci-runner -- sh -c '/workspace/sidecar/bin/plnsvc_setup.sh https://github.com/openshift-pipelines/pipeline-service main'
Error from server: error dialing backend: EOF
Failed to execute dev_setup.sh script, retrying ...
+ echo 'Failed to execute dev_setup.sh script, retrying ...'
+ sleep 5
/retest
manual testing with repeated upgrade / downgrade looks good, including my diagnostic logs wrt canceled context vs. deadline exceeded
will give the upgrade test one more try, then merging
analyzing the latest failure ...
So I see this patter of a CreateRecord
failing, followed by a UpdateLog
call, in the upgrade test that I don't see either in my manual testing of an upgrade or the install from scratch test here in CI:
{"level":"error","ts":1710358372.681495,"caller":"zap/options.go:212","msg":"finished unary call with code Unknown","grpc.auth_disabled":false,"grpc.start_time":"2024-03-13T19:32:51Z","system":"grpc","span.kind":"server","grpc.service":"tekton.results.v1alpha2.Results","grpc.method":"CreateResult","peer.address":"10.128.0.65:35622","grpc.user":"system:serviceaccount:tekton-results:tekton-results-watcher","grpc.issuer":"https://rh-oidc.s3.us-east-1.amazonaws.com/273tbj71skqksgqafoe5aotsuc44blp4","error":"ERROR: duplicate key value violates unique constraint \"results_by_name\" (SQLSTATE 23505)","grpc.code":"Unknown","grpc.time_duration_in_ms":999,"stacktrace":"github.com/grpc-ecosystem/go-grpc-middleware/logging/zap.DefaultMessageProducer\n\t/opt/app-root/src/vendor/github.com/grpc-ecosystem/go-grpc-middleware/logging/zap/options.go:212\ngithub.com/grpc-ecosystem/go-grpc-middleware/logging/zap.UnaryServerInterceptor.func1\n\t/opt/app-root/src/vendor/github.com/grpc-ecosystem/go-grpc-middleware/logging/zap/server_interceptors.go:39\ngithub.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1.1.1\n\t/opt/app-root/src/vendor/github.com/grpc-ecosystem/go-grpc-middleware/chain.go:25\ngithub.com/grpc-ecosystem/go-grpc-middleware/tags.UnaryServerInterceptor.func1\n\t/opt/app-root/src/vendor/github.com/grpc-ecosystem/go-grpc-middleware/tags/interceptors.go:23\ngithub.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1.1.1\n\t/opt/app-root/src/vendor/github.com/grpc-ecosystem/go-grpc-middleware/chain.go:25\ngithub.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1\n\t/opt/app-root/src/vendor/github.com/grpc-ecosystem/go-grpc-middleware/chain.go:34\ngithub.com/tektoncd/results/proto/v1alpha2/results_go_proto._Results_CreateResult_Handler\n\t/opt/app-root/src/proto/v1alpha2/results_go_proto/api_grpc.pb.go:258\ngoogle.golang.org/grpc.(*Server).processUnaryRPC\n\t/opt/app-root/src/vendor/google.golang.org/grpc/server.go:1372\ngoogle.golang.org/grpc.(*Server).handleStream\n\t/opt/app-root/src/vendor/google.golang.org/grpc/server.go:1783\ngoogle.golang.org/grpc.(*Server).serveStreams.func2.1\n\t/opt/app-root/src/vendor/google.golang.org/grpc/server.go:1016"} {"level":"info","ts":1710358372.881164,"caller":"zap/options.go:212","msg":"finished unary call with code OK","grpc.auth_disabled":false,"grpc.start_time":"2024-03-13T19:32:52Z","system":"grpc","span.kind":"server","grpc.service":"tekton.results.v1alpha2.Results","grpc.method":"GetRecord","peer.address":"10.128.0.65:35622","grpc.user":"system:serviceaccount:tekton-results:tekton-results-watcher","grpc.issuer":"https://rh-oidc.s3.us-east-1.amazonaws.com/273tbj71skqksgqafoe5aotsuc44blp4","grpc.code":"OK","grpc.time_duration_in_ms":798} {"level":"info","ts":1710358373.0968993,"caller":"zap/options.go:212","msg":"finished streaming call with code OK","grpc.auth_disabled":false,"grpc.start_time":"2024-03-13T19:32:51Z","system":"grpc","span.kind":"server","grpc.service":"tekton.results.v1alpha2.Logs","grpc.method":"UpdateLog","peer.address":"10.128.0.65:35622","grpc.user":"system:serviceaccount:tekton-results:tekton-results-watcher","grpc.issuer":"https://rh-oidc.s3.us-east-1.amazonaws.com/273tbj71skqksgqafoe5aotsuc44blp4","grpc.code":"OK","grpc.time_duration_in_ms":1759}
And then in the test, fetches of records for the new pipelinerun work, but the logs in the script do not.
Feels like an issue in the CI test, maybe postgresql and s3 getting out of sync, during the upgrades/downgrades.
Also does not help that the log name i.e. 4627a656-6756-345c-a3c1-2be541fd5a26 is not mentioned in the UpdateLogs call.
Going to add a little more debug, then discuss with team in standup.
well, and now it passed
merging
replaces https://github.com/openshift-pipelines/pipeline-service/pull/966 whose rebase somehow go in a bad state