Open kbristow opened 1 year ago
Did you ever create PVC to interact with Postgres manually? This may cause different file permission since your reclaim policy of storage class is Retain
.
If the data is only for test, try:
I have tried the above but the same issue occurs. I am happy to use the fix i did as per my issue., I wanted to raise it as a potential issue others using Tekton Results may run into.
I'm a little curious what's the environment difference make you encounter this problem. Could you
ls -l /bitnami/postgresql
to check who is owner of that directory? It would be weird If the directory belongs to root. The deployment configuration never uses root privilege. Then could be ebs default file permission.
Another approach would be modifying StorageClass mountOptions:
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: ebs-sc
allowVolumeExpansion: true
provisioner: ebs.csi.aws.com
reclaimPolicy: Retain
volumeBindingMode: WaitForFirstConsumer
mountOptions:
- uid=1001
- gid=1001
Looks like root ownership is correct and is the root cause of the issue:
$ ls -l /bitnami/postgresql
total 16
drwx------ 2 root root 16384 Jul 13 13:43 lost+found
I guess that is how ebs volumes are permissioned by default. Is that something you want to cater for in your default release manifest? Whilst I probably wont be using the postgres created via the Results release manifest, for users that want to try Results, it may be worthwhile putting something in to handle this permission issue incase?
Either way, happy to close the issue from my side if there is not something further you want me test.
The default Results release does require right file permissions in volume implicitly.
Whilst I probably wont be using the postgres created via the Results release manifest, for users that want to try Results, it may be worthwhile putting something in to handle this permission issue incase?
Yes, agreed with you. Especially for users want to try out in environment provided by cloud providers, the default file permission strategy varies depending on which storage they use.
It would be appreciated if you could make a PR for it, document this potential permission issue and handle the permission (of course you cloud let me do it if you don't want to).
We could close this issue after merging that PR.
I am not going to be around until next Wednesday so happy for you to do the PR. To add, once you mentioned that the postgres is running as user 1001, I realised I could just set spec.template.spec.securityContext.fsGroup: 1001
on the postgres sts which also resolves my issue. That seems like a better solution and probably doesnt need any documentation changes anything either. Thoughts?
Ok let me do it.
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale
with a justification.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close
with a justification.
If this issue should be exempted, mark the issue as frozen with /lifecycle frozen
with a justification.
/lifecycle stale
Send feedback to tektoncd/plumbing.
Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten
with a justification.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close
with a justification.
If this issue should be exempted, mark the issue as frozen with /lifecycle frozen
with a justification.
/lifecycle rotten
Send feedback to tektoncd/plumbing.
The same happens for the logs PV:
{"level":"error","ts":1711353330.4722874,"caller":"zap/options.go:212","msg":"finished streaming call with code Unknown","grpc.auth_disabled":false,"grpc.start_time":"2024-03-25T07:55:29Z","system":"grpc","span.kind":"server","grpc.service":"tekton.results.v1alpha2.Logs","grpc.method":"UpdateLog","peer.address":"10.248.106.207:48550","grpc.user":"system:serviceaccount:tekton-pipelines:tekton-results-watcher","grpc.issuer":"https://kubernetes.default.svc.cluster.local","error":"failed to create directory /logs/yournamespace/4c90c662-6e12-3c8a-b6ef-4e8f3eb8b23f, mkdir /logs/yournamespace: permission denied","grpc.code":"Unknown","grpc.time_duration_in_ms":951,"stacktrace":"github.com/grpc-ecosystem/go-grpc-middleware/logging/zap.DefaultMessageProducer\n\tgithub.com/grpc-ecosystem/go-grpc-middleware@v1.3.0/logging/zap/options.go:212\ngithub.com/grpc-ecosystem/go-grpc-middleware/logging/zap.StreamServerInterceptor.func1\n\tgithub.com/grpc-ecosystem/go-grpc-middleware@v1.3.0/logging/zap/server_interceptors.go:61\ngithub.com/grpc-ecosystem/go-grpc-middleware.ChainStreamServer.func1.1.1\n\tgithub.com/grpc-ecosystem/go-grpc-middleware@v1.3.0/chain.go:49\ngithub.com/grpc-ecosystem/go-grpc-middleware/tags.StreamServerInterceptor.func1\n\tgithub.com/grpc-ecosystem/go-grpc-middleware@v1.3.0/tags/interceptors.go:39\ngithub.com/grpc-ecosystem/go-grpc-middleware.ChainStreamServer.func1.1.1\n\tgithub.com/grpc-ecosystem/go-grpc-middleware@v1.3.0/chain.go:49\ngithub.com/grpc-ecosystem/go-grpc-middleware.ChainStreamServer.func1\n\tgithub.com/grpc-ecosystem/go-grpc-middleware@v1.3.0/chain.go:58\ngoogle.golang.org/grpc.(*Server).processStreamingRPC\n\tgoogle.golang.org/grpc@v1.60.1/server.go:1673\ngoogle.golang.org/grpc.(*Server).handleStream\n\tgoogle.golang.org/grpc@v1.60.1/server.go:1787\ngoogle.golang.org/grpc.(*Server).serveStreams.func2.1\n\tgoogle.golang.org/grpc@v1.60.1/server.go:1016"}
I am not going to be around until next Wednesday so happy for you to do the PR. To add, once you mentioned that the postgres is running as user 1001, I realised I could just set
spec.template.spec.securityContext.fsGroup: 1001
on the postgres sts which also resolves my issue. That seems like a better solution and probably doesnt need any documentation changes anything either. Thoughts?
Can we get this implemented, please?
Expected Behavior
Postgres created as part of the release manifests would start up successfully.
Actual Behavior
The postgres pod does not become healthy. The pod fails and enters crashloopbackoff with the following error in the logs:
This appears similar to this issue: https://github.com/bitnami/charts/issues/1210
Investigating the recommendation here i added an init container that looks as follows which resolves the issue:
I took the above from the output of running the below (with slight modifications):
Note that I am using EKS with ebs volumes and a storage class as below if useful:
Steps to Reproduce the Problem
Additional Info
Kubernetes version:
Output of
kubectl version
: