openshift-pipelines / pipeline-service

SaaS for Tekton Pipelines
Apache License 2.0
20 stars 43 forks source link

Revert "add config for enabling tekton results dbssl" #974

Closed enarha closed 4 months ago

enarha commented 4 months ago

Reverts openshift-pipelines/pipeline-service#906

We are reverting so we can unblock the OSP nightly versions and promote them to staging. We'll revert the revert afterwards.

avinal commented 4 months ago

/retest Looks good to me, we can merge when th tests are green.

gabemontero commented 4 months ago

Also note, if we hit some of the known upgrade tests flakes, we won't block this PR on those.

enarha commented 4 months ago

/test test-pipeline-service-upgrade-ocp-414

enarha commented 4 months ago

The upgrade test is failing with

{"level":"error","ts":"2024-03-18T15:22:38.886Z","logger":"watcher","caller":"controller/controller.go:566","msg":"Reconcile error","knative.dev/traceid":"5a8be153-1f78-42da-9775-8c7041f61afe","knative.dev/key":"plnsvc-tests/tekton-chains-metrics-vpk5g","duration":"14.485559ms","error":"error upserting record: rpc error: code = Unknown desc = GetResult(plnsvc-tests/results/9494b441-5206-46c0-95d5-31edb202d461): rpc error: code = Unknown desc = failed to connect to `host=postgres-postgresql.tekton-results.svc.cluster.local user=tekton database=tekton_results`: tls error (x509: certificate signed by unknown authority)","stacktrace":"knative.dev/pkg/controller.(*Impl).handleErr\n\t/opt/app-root/src/vendor/knative.dev/pkg/controller/controller.go:566\nknative.dev/pkg/controller.(*Impl).processNextWorkItem\n\t/opt/app-root/src/vendor/knative.dev/pkg/controller/controller.go:543\nknative.dev/pkg/controller.(*Impl).RunContext.func3\n\t/opt/app-root/src/vendor/knative.dev/pkg/controller/controller.go:491"}
+ exit 1                                                                        
command terminated with exit code 1                                             
+ exit 1

This seems related to the change we reverted. I'm checking if the feature branch is clean from that configuration.

gabemontero commented 4 months ago

The upgrade test is failing with

{"level":"error","ts":"2024-03-18T15:22:38.886Z","logger":"watcher","caller":"controller/controller.go:566","msg":"Reconcile error","knative.dev/traceid":"5a8be153-1f78-42da-9775-8c7041f61afe","knative.dev/key":"plnsvc-tests/tekton-chains-metrics-vpk5g","duration":"14.485559ms","error":"error upserting record: rpc error: code = Unknown desc = GetResult(plnsvc-tests/results/9494b441-5206-46c0-95d5-31edb202d461): rpc error: code = Unknown desc = failed to connect to `host=postgres-postgresql.tekton-results.svc.cluster.local user=tekton database=tekton_results`: tls error (x509: certificate signed by unknown authority)","stacktrace":"knative.dev/pkg/controller.(*Impl).handleErr\n\t/opt/app-root/src/vendor/knative.dev/pkg/controller/controller.go:566\nknative.dev/pkg/controller.(*Impl).processNextWorkItem\n\t/opt/app-root/src/vendor/knative.dev/pkg/controller/controller.go:543\nknative.dev/pkg/controller.(*Impl).RunContext.func3\n\t/opt/app-root/src/vendor/knative.dev/pkg/controller/controller.go:491"}
+ exit 1                                                                        
command terminated with exit code 1                                             
+ exit 1

This seems related to the change we reverted. I'm checking if the feature branch is clean from that configuration.

Yep sure seems related. That said, as we have not merged this into stage/prod/rhtap e2e's yet, sorting through upgrade / downgrade could be an item to sort out when we re-introduce this change.

While @enarha performs the feature branch examination he mentioned, I'm going to provision my own 4-14 cluster and use dev_setup.shp to manually go from main branch level, to this commit's level, and back again, so try and repro or validate that way.

enarha commented 4 months ago

The feature branch looks good to me. It looks like a remainder of the initial postgres configuration enforcing SSL verification coming from the base install and then after the "upgrade" including the revert, we are trying to use the self signed certificate which the DB rejects. I'm not sure how to verify that any further with the logs I have. Given that the feature branch looks good, I'm proposing we merge this. @gabemontero WDYT?

enarha commented 4 months ago

OK. It actually makes sense because we change the deployment script and remove the generation of the certificate https://github.com/openshift-pipelines/pipeline-service/pull/974/files#diff-57a0f2524c627d202acbdb2a0be295463a3dbc78878f11758d34709a2e9fc7d1L33 so this is always expected to fail.

gabemontero commented 4 months ago

OK. It actually makes sense because we change the deployment script and remove the generation of the certificate https://github.com/openshift-pipelines/pipeline-service/pull/974/files#diff-57a0f2524c627d202acbdb2a0be295463a3dbc78878f11758d34709a2e9fc7d1L33 so this is always expected to fail.

OK I understand the remove generation bit you highlighted @enarha ^^ but I'm missing from your prior comment https://github.com/openshift-pipelines/pipeline-service/pull/974#issuecomment-2004342446 why we are still trying to use the self signed cert after the "upgrade" i.e. revert. Can you clarify?

Just trying to make sure that if we had to revert a future re-introduction of @avinal 's change in say stage we don't end up in a broken state.

gabemontero commented 4 months ago

the tl;dr: the bitnami postgresql helm chart is not predictable wrt recycling postgresl as part of upgrade/downgrade testing

now, afawk, that helm chart is not used for RDS and stage / prod, so the flakiness here is unrelated

that said, we may need to do some upgrade only testing when @avinal re-introduces ssl

gabemontero commented 4 months ago

the tl;dr: the bitnami postgresql helm chart is not predictable wrt recycling postgresl as part of upgrade/downgrade testing

now, afawk, that helm chart is not used for RDS and stage / prod, so the flakiness here is unrelated

that said, we may need to do some upgrade only testing when @avinal re-introduces ssl

actually looking at https://github.com/redhat-appstudio/infra-deployments/tree/main/components/pipeline-service/staging/stone-stg-m01/resources @enarha @avinal maybe the bitnami helm chart is used for stage/prod ???

something to drill down in stand up

enarha commented 4 months ago

@gabemontero Why do you think it's used? I do not see it being used. Grep for bitnami in the infra-deployments repo does not return any matches.

gabemontero commented 4 months ago

On Mon, Mar 18, 2024 at 2:34 PM Emil Natan @.***> wrote:

@gabemontero https://github.com/gabemontero Why do you think it's used? I do not see it being used. Grep for bitnami in the infra-deployments repo does not return any matches.

because I am not finding the alternative anywhere I am wondering if the postgres def is pulled from the pipeline service repo

I have heard our RDS described as postgresql enabled or something like that

Let’s get the precise config path explained to us and go from there

— Reply to this email directly, view it on GitHub https://github.com/openshift-pipelines/pipeline-service/pull/974#issuecomment-2004657614, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA3NU5DQZ35DB2Z2QDBLDMDYY4XUXAVCNFSM6AAAAABE3RVYWKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMBUGY2TONRRGQ . You are receiving this because you were mentioned.Message ID: @.***>

enarha commented 4 months ago

It looks like Terraform is used for managing the RDS. I found configuration like this for each cluster components/gitops/staging/stone-stage-p01/gitops-service-postgres-rds-config-path.yaml.

gabemontero commented 4 months ago

It looks like Terraform is used for managing the RDS. I found configuration like this for each cluster components/gitops/staging/stone-stage-p01/gitops-service-postgres-rds-config-path.yaml.

Good find. So how do the changes to https://github.com/openshift-pipelines/pipeline-service/blob/main/developer/openshift/gitops/argocd/pipeline-service-storage/postgres.yaml get mapped to that terraform config?

Or in other words, seems like equivalent changes need to be made to the terraform config @enarha @avinal to get the equivalent effect we are doing in developer mode to the postgresql DB, no?

avinal commented 4 months ago

Long thread here 😅 , I will try to address all the queries.