Closed harrythecode closed 5 months ago
Seems like you're confusing two different scenarios here.
When we're talking about Persistent Volumes, they come empty and they are writable. If you deploy the example you referred to as is, a new volume becomes provisioned through a PVC, the volume gets mounted to /var/lib/grafana
, and it'll all works just fine:
The snippet (with fsGroup
) you shared is contradicting, because it looks like you're not using persistent volumes.
When you deploy a basic example (quoted below), any changes in Grafana are not persistent (they're gone once the respective pod is gone). To make sure Grafana has enough permissions to store its data, an emptyDir
volume is automatically mounted to /var/lib/grafana
. Again, everything works just fine.
apiVersion: grafana.integreatly.org/v1beta1
kind: Grafana
metadata:
name: grafana
labels:
dashboards: "grafana"
spec:
config:
log:
mode: "console"
auth:
disable_login_form: "false"
security:
admin_user: root
admin_password: secret
uid 472 is used in the default grafana image:
But grafana-operator automatically adds runAsNonRoot: true
to a pod securityContext
, which changes the default uid to something else, the exact id is likely to be different in your case:
Assuming you're using persistent volumes (unlike in the snippet you shared), I guess you deployed grafana with modified securityContext
(runAsNonRoot: false
or something else; directly or through mutating webhooks), grafana created some files, then you redeployed it with another set of settings, and now grafana fails to write data to the pre-existing files, because they had been created with different permissions.
When you specify a custom fsGroup
, Kubernetes changes ownership and permissions for files upon pod start (docs).
I don't think we need to update the example. If you redeploy grafana with a brand new volume using an up-to-date version of the operator, it should all just work.
I'm also seeing what @harrythecode is reporting, this time I made sure that I have no PV/PVCs prior to deploying the Grafana
object (and yes, I see a PV getting created so I'm certain the persistentVolumeClaim
option in the manifest is taking effect), so a volume with stale files and wrong permissions is unlikely to be the cause.
Currently running v5.6.3
of the operator.
@caguiclajmg @harrythecode It'd be helpful if you could share more information around your environment:
Grafana
manifest;Deployment
manifest;Pod
manifest;PersistentVolumeClaim
manifest.Also, if it's something that can be reproduced in local (kind, microk8s, ...) or cloud provider environment, then full instructions would be helpful.
Hi,
I ran into the same issue just today, when I tried to use a persistent volume claim for Grafana in a Kubernetes Cluster (Amazon EKS). With the options from the example yaml I also experienced the same error "missing file permissions". (I also started from scratch, no existing pvcs and so on) I played around a bit and also found this Github issue, which helped me in debugging.
When I add:
spec.deployment.spec.template.spec.securityContext.fsGroup: 10001
It seems to work. Maybe this helps in digging into the root cause of this issue. :)
I removed some individual stuff from the YAML I use, but with the following yaml it seems to run at the moment:
kind: Grafana
metadata:
name: grafana
namespace: monitoring
labels:
dashboards: "grafana"
spec:
persistentVolumeClaim:
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Gi
config:
log:
mode: "console"
[...]
deployment:
spec:
template:
spec:
serviceAccountName: secrets-csi-sa-monitoring
securityContext:
fsGroup: 10001
containers:
- name: grafana
securityContext:
allowPrivilegeEscalation: true
readOnlyRootFilesystem: false
readinessProbe:
failureThreshold: 3
[...]
image: grafana/grafana:10.4.0
[...]
volumes:
- name: grafana-data
persistentVolumeClaim:
claimName: grafana-pvc
[...]
@bavarian-ng Thanks for reporting this, though, unfortunately, it's not enough to share only Grafana
CR here as the end pod spec can be influenced by various webhooks. Please, take a look at my comment above, which describes which of the resources can give us a better understanding of the things you experience in your cluster.
@weisdd , sorry here the requested info:
Kubernetes / OpenShift?: Kubernetes
Version: v1.27
Details on underlying storage that you're using: AWS GP2
# kubectl describe storageclass gp2
Name: gp2
IsDefaultClass: Yes
Annotations: kubectl.kubernetes.io/last-applied-configuration={"apiVersion":"storage.k8s.io/v1","kind":"StorageClass","metadata":{"annotations":{"storageclass.kubernetes.io/is-default-class":"true"},"name":"gp2"},"parameters":{"fsType":"ext4","type":"gp2"},"provisioner":"kubernetes.io/aws-ebs","volumeBindingMode":"WaitForFirstConsumer"}
,storageclass.kubernetes.io/is-default-class=true
Provisioner: kubernetes.io/aws-ebs
Parameters: fsType=ext4,type=gp2
AllowVolumeExpansion: <unset>
MountOptions: <none>
ReclaimPolicy: Delete
VolumeBindingMode: WaitForFirstConsumer
Events: <none>
Full Grafana manifest;
apiVersion: grafana.integreatly.org/v1beta1
kind: Grafana
metadata:
name: {{ .Values.metadata.app_name }}
namespace: {{ .Values.metadata.namespace }}
labels:
dashboards: "grafana"
spec:
persistentVolumeClaim:
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Gi
config:
log:
mode: "console"
auth:
disable_login_form: "false"
auth.google:
enabled: "true"
scopes: https://www.googleapis.com/auth/userinfo.profile https://www.googleapis.com/auth/userinfo.email
auth_url: https://accounts.google.com/o/oauth2/auth
token_url: https://oauth2.googleapis.com/token
allowed_domains: <REDACTED OUR DOMAIN>
allow_sign_up: "true"
server:
root_url: https://{{ .Values.ingress.servicename }}.{{ .Values.metadata.stage }}.{{ .Values.ingress.dns_zone }}
serve_from_sub_path: "true"
users:
auto_assign_org_role: "Editor"
deployment:
spec:
template:
spec:
serviceAccountName: secrets-csi-sa-monitoring
securityContext:
fsGroup: 10001
containers:
- name: grafana
securityContext:
allowPrivilegeEscalation: true
readOnlyRootFilesystem: false
readinessProbe:
failureThreshold: 3
env:
- name: GF_AUTH_GOOGLE_CLIENT_ID
valueFrom:
secretKeyRef:
name: grafana-google-sso
key: client_id
- name: GF_AUTH_GOOGLE_CLIENT_SECRET
valueFrom:
secretKeyRef:
name: grafana-google-sso
key: client_secret
- name: GF_INSTALL_PLUGINS
value: grafana-oncall-app
- name: GF_SECURITY_ADMIN_USER
valueFrom:
secretKeyRef:
name: grafana-admin-creds
key: adminuser
- name: GF_SECURITY_ADMIN_PASSWORD
valueFrom:
secretKeyRef:
name: grafana-admin-creds
key: adminpassword
image: grafana/grafana:10.4.0
volumeMounts:
- name: secret-store-grafana
mountPath: "/mnt/secrets"
- name: secret-store-grafana-sso
mountPath: "/mnt/sso-secrets/"
- name: plugin-config
mountPath: "/etc/grafana/provisioning/plugins/"
resources:
limits:
memory: {{ .Values.deployment.containers.grafana.resources.limits.memory }}
requests:
cpu: {{ .Values.deployment.containers.grafana.resources.requests.cpu }}
memory: {{ .Values.deployment.containers.grafana.resources.requests.memory }}
volumes:
- name: grafana-data
persistentVolumeClaim:
claimName: grafana-pvc
- name: plugin-config
configMap:
name: grafana-oncall-plugin-config
- name: plugin-folder
hostPath:
path: /etc/grafana/provisioning/plugins
type: DirectoryOrCreate
- name: secret-store-grafana
csi:
driver: secrets-store.csi.k8s.io
readOnly: true
volumeAttributes:
secretProviderClass: "spc-grafana"
- name: secret-store-grafana-sso
csi:
driver: secrets-store.csi.k8s.io
readOnly: true
volumeAttributes:
secretProviderClass: "spc-grafana-sso"
service:
spec:
type: NodePort
ports:
- protocol: TCP
port: 3000
ingress:
metadata:
annotations:
kubernetes.io/ingress.class: alb
alb.ingress.kubernetes.io/scheme: internet-facing
alb.ingress.kubernetes.io/target-type: instance
alb.ingress.kubernetes.io/load-balancer-name: central-loadbalancer-{{ .Values.metadata.stage }}
alb.ingress.kubernetes.io/backend-protocol: HTTP
alb.ingress.kubernetes.io/listen-ports: '[{"HTTP":80}, {"HTTPS":443}]'
alb.ingress.kubernetes.io/ssl-redirect: '443'
alb.ingress.kubernetes.io/ssl-policy: ELBSecurityPolicy-TLS13-1-2-2021-06
alb.ingress.kubernetes.io/certificate-arn: {{ .Values.ingress.certificate_arn }}
alb.ingress.kubernetes.io/subnets: 'steelmntn-{{ .Values.metadata.stage }}-vpc-public-eu-central-1a,steelmntn-{{ .Values.metadata.stage }}-vpc-public-eu-central-1b,steelmntn-{{ .Values.metadata.stage }}-vpc-public-eu-central-1c'
alb.ingress.kubernetes.io/group.name: steelmountain-ingress-group
external-dns.alpha.kubernetes.io/hostname: {{ .Values.ingress.servicename }}.{{ .Values.metadata.stage }}.{{ .Values.ingress.dns_zone }}
external-dns.alpha.kubernetes.io/ttl: "60"
external-dns.alpha.kubernetes.io/ingress-hostname-source: annotation-only
spec:
ingressClassName: alb
rules:
- host: {{ .Values.ingress.servicename }}.{{ .Values.metadata.stage }}.{{ .Values.ingress.dns_zone }}
http:
paths:
- backend:
service:
name: {{ .Values.service.name }}
port:
number: 3000
path: /
pathType: Prefix
tls:
- hosts:
- {{ .Values.ingress.servicename }}.{{ .Values.metadata.stage }}.{{ .Values.ingress.dns_zone }}
- Full Deployment manifest; (resulting deployment created by operator:)
apiVersion: apps/v1 kind: Deployment metadata: annotations: deployment.kubernetes.io/revision: '1' creationTimestamp: '2024-03-11T14:41:22Z' generation: 1 name: grafana-deployment namespace: monitoring ownerReferences:
- Full Pod manifest; (resulting manifest deployed by Operator)
apiVersion: v1 kind: Pod metadata: creationTimestamp: '2024-03-11T14:41:22Z' generateName: grafana-deployment-7bcdb6d464- labels: app: grafana pod-template-hash: 7bcdb6d464 name: grafana-deployment-7bcdb6d464-pdgdt namespace: monitoring ownerReferences:
- Full PersistentVolumeClaim manifest.
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
annotations:
pv.kubernetes.io/bind-completed: 'yes'
pv.kubernetes.io/bound-by-controller: 'yes'
volume.beta.kubernetes.io/storage-provisioner: ebs.csi.aws.com
volume.kubernetes.io/selected-node: <NODE NAME REDACTED>
volume.kubernetes.io/storage-provisioner: ebs.csi.aws.com
creationTimestamp: '2024-03-11T14:41:22Z'
finalizers:
- kubernetes.io/pvc-protection
name: grafana-pvc
namespace: monitoring
ownerReferences:
- apiVersion: grafana.integreatly.org/v1beta1
kind: Grafana
name: grafana
uid: 67785905-eb4c-4922-a68a-fe4821aebfb0
resourceVersion: '90847469'
uid: 0a718f39-5eb2-4a23-9d29-b630be8539f6
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Gi
storageClassName: gp2
volumeMode: Filesystem
volumeName: pvc-0a718f39-5eb2-4a23-9d29-b630be8539f6
status:
accessModes:
- ReadWriteOnce
capacity:
storage: 10Gi
phase: Bound
This issue hasn't been updated for a while, marking as stale, please respond within the next 7 days to remove this label
any update on this one?
Was there any answer provided after relevant informations were posted? I'm also faced with the same issue on a brand new Grafana deployment on GKE.
What's the problem?
While trying the example for using Persistent Volume in grafana-operator, which is documented here: https://grafana.github.io/grafana-operator/docs/examples/persistent_volume/readme/
I encountered an error:
This happens because the new config in Deployment, meant to override the original config, was actually causing permission issues as fixed in this issue: https://github.com/grafana/grafana-operator/issues/300
However, I managed to resolve it by setting
securityContext - fsGroup: 472
again as shown below:This issue is related to https://github.com/grafana/grafana-operator/issues/1418#issuecomment-1953609940
What I want
Could we update the documentation example to include the
securityContext - fsGroup
?