Closed renepardon closed 2 years ago
As a quick fix: have you tried running chmod g-rw /data/acme.json
in the pod?
Hello @Jawastew no, didn't tried yet. I switched back to traefik/stable https://github.com/helm/charts/tree/master/stable/traefik
https://github.com/containous/traefik-helm-chart/issues/164#issuecomment-620588820 I tried that. Problem is that when the volume is mounted by a new pod, it's mounted with 660 permissions again...
❯ kc exec -it traefik-69fc795fd-vg2qj /bin/sh
/ $ ls -alh /data/acme.json
-rw-rw---- 1 65532 65532 0 Apr 29 07:21 /data/acme.json
/ $ echo "test" > /data/acme.json
/ $ cat /data/acme.json
test
/ $ chmod g-rw /data/acme.json
/ $ ls -alh /data/acme.json
-rw------- 1 65532 65532 5 Apr 29 08:38 /data/acme.json
/ $ exit
❯ kc delete pod traefik-69fc795fd-vg2qj
pod "traefik-69fc795fd-vg2qj" deleted
❯ kc exec -it traefik-69fc795fd-7mgjf /bin/sh
/ $ ls -alh /data/acme.json
-rw-rw---- 1 65532 65532 5 Apr 29 08:38 /data/acme.json
/ $ cat /data/acme.json
test
The same issue on my side with acme.json permissions reset to 660 on each mount of a GKE persistent volume.
Ran into the same problem in a GKE cluster. Turns out it has to do with the pod security context. Specifically those lines in the traefik pod yaml:
securityContext:
fsGroup: 65532
When I removed those lines from the manifest, I was able to chmod the acme.json back to 600 and it stayed that way throughout multiple restarts.
I added the following to my values.yaml
to remove the security context:
# Fix for acme.json file being changed to 660 from 600
podSecurityContext:
fsGroup: null
I have seen the same kind of behaviour sometimes (can't remember how to reproduce it) on deployments that were on baremetal. In those cases, I was just deleting the acme.json, restart the Traefik pod and the access would persist as 600.
I am still missing something as this behavior always happens on GKE (with the native GKE storage class) but sometimes happens on baremetal (with a NFS storage class).
Hopefully this can unblock some people until we figure out what the real problem is.
Getting the same behavior on EKS, and it seems like this is how fsGroup
is designed to work in k8s.
I think that fsGroup
should be removed from the default values unless there's a specific use-case for that.
Is it possible to change permissions 660 for /data/acme.json are too open, please use 600
to a warning?
In Azure Kubernetes Services (AKS) it is a common problem that disks are mounted with unexpected permissions. If you use file storage as a volume claim, wich is much cheaper, you can not even change the permissions of the mounted volume.
I just found out that deleting fsGroup
from the template will mount the directory as root, which doesn't let traefik's process write to the mounted directory.
It seems like the only way to be certain here is to use something like as a workaround:
initContainers:
- name: take-data-dir-ownership
image: alpine:3.6
command:
- chown
- -R
- 600:600
- /data
volumeMounts:
- name: data
mountPath: /data
But maybe there's a way to control permissions with which k8s mounts volumes?
Somehow this does the trick entirely, and file is created with 0600
and later mounted with 0600
:
persistence:
enabled: true
accessMode: ReadWriteOnce
size: 128Mi
# storageClass: ""
path: /data
annotations: {
"pv.beta.kubernetes.io/gid": "65532"
}
podSecurityContext: null
I have the same issue, skipped resolvers because the saved certificate's permission are 660 instead of 600. Nor the annotation trick or chmod the file works for me. My temporary workaround is to first disabled persistence, everything works, then update helm chart with persistence enabled. Though it will break on next pod delete / restart.
Helm chart traefik-8.1.4, on a bare metal k3s kubernetes with longhorn as storage provider.
Somehow this does the trick entirely, and file is created with
0600
and later mounted with0600
:persistence: enabled: true accessMode: ReadWriteOnce size: 128Mi # storageClass: "" path: /data annotations: { "pv.beta.kubernetes.io/gid": "65532" } podSecurityContext: null
Doesn't seem to do the trick here; still getting the same error message.
I'm also seeing the same behavior running traefik v2.2.1 on EKS (v1.16). Neither annotation of PVC with fsGroup nor setting podSecurityContext to null worked for me.
On my cluster running on AKS I had to login to traefik pod and delete the acme.json (and restart the pod) to make it work.
I have the same problem and doing what @jeff1985 mentioned above works for me, the downside is that every time that the pod goes down for some reason, I will need to delete the file and restart the pod.
What I actually been doing is:
persistence:
enabled: true
accessMode: ReadWriteOnce
size: 128Mi
# storageClass: ""
path: /data
annotations: {
"pv.beta.kubernetes.io/gid": "65532"
}
chmod 0600
at acme.json
podSecurityContext: null
to the values and apply new chartidk why, but doing it like that kept proper permissions on acme.json
after pod recreation (i.e. kubectl delete pod {current traefik pod}
)
@sashasimkin that unfortunately didn't work for me 😢
@davilag this work for me!
persistence:
enabled: true
size: 1Gi
annotations: {
"pv.beta.kubernetes.io/gid": "65532"
}
podSecurityContext:
fsGroup: 65532
try and let me know any questions.
None of the solutions work for me except for initContainers
.
Hello, i have the same issue, following works for me:
If Traefik is running execute:
kubectl exec deploy/traefik -n default -- chmod 600 /data/acme.json
Edit these lines:
persistence: enabled: true size: 1G path: /data annotations: { "pv.beta.kubernetes.io/gid": "65532" }
podSecurityContext: null
then upgrade the helm chart with the values:
helm upgrade traefik traefik/traefik --values traefik-values.yml
Addional Informations: Provider: Scaleway Elements Kapsule Storage: Scaleway Elements Block Storage SSD, storageClass: default Kubernetes Version 1.18.5 Traefik Version: 2.2.1
I think checking the permissions of the acme.json file should be a feature in traefik that you can disable. This creates so many problems.
I think a quickfix should be given for this issue until a fix is found. Very annoying problem.
Version 8.9.2 of this Traefik Helm chart works for me, v8.10.0 and 8.11.0 both have this issue with the "permissions 660" error message. Couldn't fix this behavior using init containers or different security context group. I'm using a bare metal k8s.
Still running into this issue. Now using chart version 9.1.0
It turns out, if you look in the values.yaml file they commented specifically about this issue.
Has anybody a working configuration with init containers that fixes this for good? I ran into this on my AKS cluster and after playing around with the solutions in this issue here I eventually came up with a working state, but I'm afraid to update the release because in the past that resulted in a broken state again.
I just added the initContainer setup that is mentioned in the Helm chart values.yaml, as indicated by @fearoffish and it looks to be working fine.
I confirm that after adding the volume-permissions
to my values I have not suffered from permission issues again.
Here's my added configuration:
deployment:
initContainers:
# The "volume-permissions" init container is required if you run into permission issues.
# Related issue: https://github.com/containous/traefik/issues/6972
- name: volume-permissions
image: busybox:1.31.1
command: ["sh", "-c", "chmod -Rv 600 /data/*"]
volumeMounts:
- name: data
mountPath: /data
[EDIT (much later)] I recently had to deploy Traefik in a bare-metal cluster and ran into the exact same permission issue and can once again CONFIRM that adding this Init Container configuration does work. The trick is to change everything inside the folder, not the folder itself.
Edit2: seems to work. My bad! Edit: Ok I forgot to make the volume persistent. I did that, now I will check with a new domain if there are no more errors.
Anyone running into the problem with the initContainers? If I use the values in the comments, the init container can't complete because it can't find the /data directory in the other container? The volume-permissions containers logs say: chmod: /data/*: No such file or directory
. Any advice?
full copy of values.yml
# Default values for Traefik
image:
name: traefik
tag: 2.3.1
pullPolicy: IfNotPresent
#
# Configure the deployment
#
deployment:
enabled: true
# Number of pods of the deployment
replicas: 1
# Additional deployment annotations (e.g. for jaeger-operator sidecar injection)
annotations: {}
# Additional pod annotations (e.g. for mesh injection or prometheus scraping)
podAnnotations: {}
# Additional containers (e.g. for metric offloading sidecars)
additionalContainers:
[]
# https://docs.datadoghq.com/developers/dogstatsd/unix_socket/?tab=host
# - name: socat-proxy
# image: alpine/socat:1.0.5
# args: ["-s", "-u", "udp-recv:8125", "unix-sendto:/socket/socket"]
# volumeMounts:
# - name: dsdsocket
# mountPath: /socket
# Additional volumes available for use with initContainers and additionalContainers
additionalVolumes:
[]
# - name: dsdsocket
# hostPath:
# path: /var/run/statsd-exporter
# Additional initContainers (e.g. for setting file permission as shown below)
initContainers:
- name: volume-permissions
image: busybox:1.31.1
command: ["sh", "-c", "chmod -Rv 600 /data/*"]
volumeMounts:
- name: data
mountPath: /data
# The "volume-permissions" init container is required if you run into permission issues.
# Related issue: https://github.com/traefik/traefik/issues/6972
# Custom pod DNS policy. Apply if `hostNetwork: true`
# dnsPolicy: ClusterFirstWithHostNet
# Pod disruption budget
podDisruptionBudget:
enabled: false
# maxUnavailable: 1
# minAvailable: 0
# Use ingressClass. Ignored if Traefik version < 2.3 / kubernetes < 1.18.x
ingressClass:
# true is not unit-testable yet, pending https://github.com/rancher/helm-unittest/pull/12
enabled: false
isDefaultClass: false
# Activate Pilot integration
pilot:
enabled: false
token: ""
# Enable experimental features
experimental:
plugins:
enabled: false
# Create an IngressRoute for the dashboard
ingressRoute:
dashboard:
enabled: true
# Additional ingressRoute annotations (e.g. for kubernetes.io/ingress.class)
annotations: {}
# Additional ingressRoute labels (e.g. for filtering IngressRoute by custom labels)
labels: {}
rollingUpdate:
maxUnavailable: 1
maxSurge: 1
#
# Configure providers
#
providers:
kubernetesCRD:
enabled: true
kubernetesIngress:
enabled: true
# IP used for Kubernetes Ingress endpoints
publishedService:
enabled: false
# Published Kubernetes Service to copy status from. Format: namespace/servicename
# By default this Traefik service
# pathOverride: ""
#
# Add volumes to the traefik pod. The volume name will be passed to tpl.
# This can be used to mount a cert pair or a configmap that holds a config.toml file.
# After the volume has been mounted, add the configs into traefik by using the `additionalArguments` list below, eg:
# additionalArguments:
# - "--providers.file.filename=/config/dynamic.toml"
volumes: []
# - name: public-cert
# mountPath: "/certs"
# type: secret
# - name: '{{ printf "%s-configs" .Release.Name }}'
# mountPath: "/config"
# type: configMap
# Logs
# https://docs.traefik.io/observability/logs/
logs:
# Traefik logs concern everything that happens to Traefik itself (startup, configuration, events, shutdown, and so on).
general:
# By default, the logs use a text format (common), but you can
# also ask for the json format in the format option
# format: json
# By default, the level is set to ERROR. Alternative logging levels are DEBUG, PANIC, FATAL, ERROR, WARN, and INFO.
level: ERROR
access:
# To enable access logs
enabled: false
# By default, logs are written using the Common Log Format (CLF).
# To write logs in JSON, use json in the format option.
# If the given format is unsupported, the default (CLF) is used instead.
# format: json
# To write the logs in an asynchronous fashion, specify a bufferingSize option.
# This option represents the number of log lines Traefik will keep in memory before writing
# them to the selected output. In some cases, this option can greatly help performances.
# bufferingSize: 100
# Filtering https://docs.traefik.io/observability/access-logs/#filtering
filters:
{}
# statuscodes: "200,300-302"
# retryattempts: true
# minduration: 10ms
# Fields
# https://docs.traefik.io/observability/access-logs/#limiting-the-fieldsincluding-headers
fields:
general:
defaultmode: keep
names:
{}
# Examples:
# ClientUsername: drop
headers:
defaultmode: drop
names:
{}
# Examples:
# User-Agent: redact
# Authorization: drop
# Content-Type: keep
globalArguments:
- "--global.checknewversion"
- "--global.sendanonymoususage"
#
# Configure Traefik static configuration
# Additional arguments to be passed at Traefik's binary
# All available options available on https://docs.traefik.io/reference/static-configuration/cli/
## Use curly braces to pass values: `helm install --set="additionalArguments={--providers.kubernetesingress.ingressclass=traefik-internal,--log.level=DEBUG}"`
additionalArguments:
- "--certificatesresolvers.letsencrypt.acme.email=letsencrypt@ikbendirk.nl"
- "--certificatesresolvers.letsencrypt.acme.storage=/data/acme.json"
#- "--certificatesresolvers.letsencrypt.acme.caserver=https://acme-v02.api.letsencrypt.org/directory"
- "--certificatesresolvers.letsencrypt.acme.caserver=https://acme-staging-v02.api.letsencrypt.org/directory"
- "--certificatesResolvers.letsencrypt.acme.tlschallenge=true"
- "--api.insecure=true"
- "--accesslog=true"
- "--log.level=INFO"
# - "--providers.kubernetesingress.ingressclass=traefik-internal"
# - "--log.level=DEBUG"
# Environment variables to be passed to Traefik's binary
env: []
# - name: SOME_VAR
# value: some-var-value
# - name: SOME_VAR_FROM_CONFIG_MAP
# valueFrom:
# configMapRef:
# name: configmap-name
# key: config-key
# - name: SOME_SECRET
# valueFrom:
# secretKeyRef:
# name: secret-name
# key: secret-key
envFrom: []
# - configMapRef:
# name: config-map-name
# - secretRef:
# name: secret-name
# Configure ports
ports:
# The name of this one can't be changed as it is used for the readiness and
# liveness probes, but you can adjust its config to your liking
traefik:
port: 9000
# Use hostPort if set.
# hostPort: 9000
#
# Use hostIP if set. If not set, Kubernetes will default to 0.0.0.0, which
# means it's listening on all your interfaces and all your IPs. You may want
# to set this value if you need traefik to listen on specific interface
# only.
# hostIP: 192.168.100.10
# Defines whether the port is exposed if service.type is LoadBalancer or
# NodePort.
#
# You SHOULD NOT expose the traefik port on production deployments.
# If you want to access it from outside of your cluster,
# use `kubectl port-forward` or create a secure ingress
expose: false
# The exposed port for this service
exposedPort: 9000
# The port protocol (TCP/UDP)
protocol: TCP
web:
port: 8000
# hostPort: 8000
expose: true
exposedPort: 80
# The port protocol (TCP/UDP)
protocol: TCP
# Use nodeport if set. This is useful if you have configured Traefik in a
# LoadBalancer
# nodePort: 32080
# Port Redirections
# Added in 2.2, you can make permanent redirects via entrypoints.
# https://docs.traefik.io/routing/entrypoints/#redirection
# redirectTo: websecure
websecure:
port: 8443
# hostPort: 8443
expose: true
exposedPort: 443
# The port protocol (TCP/UDP)
protocol: TCP
# nodePort: 32443
# Set TLS at the entrypoint
# https://doc.traefik.io/traefik/routing/entrypoints/#tls
tls:
enabled: false
# this is the name of a TLSOption definition
options: ""
certResolver: ""
domains: []
# - main: example.com
# sans:
# - foo.example.com
# - bar.example.com
# TLS Options are created as TLSOption CRDs
# https://doc.traefik.io/traefik/https/tls/#tls-options
# Example:
# tlsOptions:
# default:
# sniStrict: true
# preferServerCipherSuites: true
# foobar:
# curvePreferences:
# - CurveP521
# - CurveP384
tlsOptions: {}
# Options for the main traefik service, where the entrypoints traffic comes
# from.
service:
enabled: true
type: LoadBalancer
# Additional annotations (e.g. for cloud provider specific config)
annotations: {}
# Additional service labels (e.g. for filtering Service by custom labels)
labels: {}
# Additional entries here will be added to the service spec. Cannot contains
# type, selector or ports entries.
spec:
{}
# externalTrafficPolicy: Cluster
# loadBalancerIP: "1.2.3.4"
# clusterIP: "2.3.4.5"
loadBalancerSourceRanges:
[]
# - 192.168.0.1/32
# - 172.16.0.0/16
externalIPs:
[]
# - 1.2.3.4
## Create HorizontalPodAutoscaler object.
##
autoscaling:
enabled: false
# minReplicas: 1
# maxReplicas: 10
# metrics:
# - type: Resource
# resource:
# name: cpu
# targetAverageUtilization: 60
# - type: Resource
# resource:
# name: memory
# targetAverageUtilization: 60
# Enable persistence using Persistent Volume Claims
# ref: http://kubernetes.io/docs/user-guide/persistent-volumes/
# After the pvc has been mounted, add the configs into traefik by using the `additionalArguments` list below, eg:
# additionalArguments:
# - "--certificatesresolvers.le.acme.storage=/data/acme.json"
# It will persist TLS certificates.
persistence:
enabled: false
# existingClaim: ""
accessMode: ReadWriteOnce
size: 128Mi
# storageClass: ""
path: /data
annotations: {}
# subPath: "" # only mount a subpath of the Volume into the pod
# If hostNetwork is true, runs traefik in the host network namespace
# To prevent unschedulabel pods due to port collisions, if hostNetwork=true
# and replicas>1, a pod anti-affinity is recommended and will be set if the
# affinity is left as default.
hostNetwork: false
# Whether Role Based Access Control objects like roles and rolebindings should be created
rbac:
enabled: true
# If set to false, installs ClusterRole and ClusterRoleBinding so Traefik can be used across namespaces.
# If set to true, installs namespace-specific Role and RoleBinding and requires provider configuration be set to that same namespace
namespaced: false
# Enable to create a PodSecurityPolicy and assign it to the Service Account via RoleBindin or ClusterRoleBinding
podSecurityPolicy:
enabled: false
# The service account the pods will use to interact with the Kubernetes API
serviceAccount:
# If set, an existing service account is used
# If not set, a service account is created automatically using the fullname template
name: ""
# Additional serviceAccount annotations (e.g. for oidc authentication)
serviceAccountAnnotations: {}
resources:
{}
# requests:
# cpu: "100m"
# memory: "50Mi"
# limits:
# cpu: "300m"
# memory: "150Mi"
affinity: {}
# # This example pod anti-affinity forces the scheduler to put traefik pods
# # on nodes where no other traefik pods are scheduled.
# # It should be used when hostNetwork: true to prevent port conflicts
# podAntiAffinity:
# requiredDuringSchedulingIgnoredDuringExecution:
# - labelSelector:
# matchExpressions:
# - key: app
# operator: In
# values:
# - {{ template "traefik.name" . }}
# topologyKey: failure-domain.beta.kubernetes.io/zone
nodeSelector: {}
tolerations: []
# Pods can have priority.
# Priority indicates the importance of a Pod relative to other Pods.
priorityClassName: ""
# Set the container security context
# To run the container with ports below 1024 this will need to be adjust to run as root
securityContext:
capabilities:
drop: [ALL]
readOnlyRootFilesystem: true
runAsGroup: 65532
runAsNonRoot: true
runAsUser: 65532
podSecurityContext:
fsGroup: 65532
@DirkWolthuis I used find /data/ -exec chmod 600 {} \;
Using preemptive nodes in GKE with the init container approach still not working properlly.
Deploying for the first time it works and keeps /data/*
with the correct permissions but if the container migrates to other node (because the original node was force to preempt) the permissions are lost.
I'm facing this issue with longhorn storage. @SantoDE, can you mark this issue as a bug?
Very hacky but what worked for me was log into the pod kubectl exec -it <podname> /bin/sh
update the permissions on the acme.json file chmod g-rw /certs/acme.json
and then restart the docker container (not the pod):
kubectl exec POD_NAME -c traefik /sbin/reboot
This is on GKE:
❯ kubectl version
Client Version: version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.4", GitCommit:"d360454c9bcd1634cf4cc52d1867af5491dc9c5f", GitTreeState:"clean", BuildDate:"2020-11-11T13:17:17Z", GoVersion:"go1.15.2", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"17+", GitVersion:"v1.17.13-gke.2001", GitCommit:"00c919adfe4adf308bcd7c02838f2a1b60482f02", GitTreeState:"clean", BuildDate:"2020-11-06T18:24:02Z", GoVersion:"go1.13.15b4", Compiler:"gc", Platform:"linux/amd64"}
Not sure if this helps, but most of the tips above didn't work for me (including the initContainer
). I ended up writing a StorageClass
which seems to work well (using AKS).
i.e., first, create a StorageClass azure-acme-file.yaml:
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
name: azure-acme-file
# Use Azure file, not block storage: in an error scenario, this will be accessed by multiple virtual machines at the same time, thus requiring multi-attach capabilities (ReadWriteMany)
provisioner: kubernetes.io/azure-file
# Allow our user 65532 to read, write and -important- execute (i.e. access / list) the directory,
# and read and write files in that directory. Traefik will require an 0600 for the file.
mountOptions:
- dir_mode=0700
- file_mode=0600
- uid=65532
- gid=65532
parameters:
skuName: Standard_LRS # no fancy requirements on this volume
storageAccount: yourAzureStorageAccount # needs to match an existing Azure storage account
kubectl apply -f azure-acme-file.yaml
Then, in my traefik.yaml
persistence:
enabled: true
accessMode: ReadWriteMany
size: 50Mi
storageClass: "azure-acme-file"
path: /data
additionalArguments:
- "--certificatesresolvers.cloudflare.acme.storage=/data/acme.json"
Not sure if this helps, but most of the tips above didn't work for me (including the
initContainer
). I ended up writing aStorageClass
which seems to work well (using AKS).i.e., first, create a StorageClass azure-acme-file.yaml:
kind: StorageClass apiVersion: storage.k8s.io/v1 metadata: name: azure-acme-file # Use Azure file, not block storage: in an error scenario, this will be accessed by multiple virtual machines at the same time, thus requiring multi-attach capabilities (ReadWriteMany) provisioner: kubernetes.io/azure-file # Allow our user 65532 to read, write and -important- execute (i.e. access / list) the directory, # and read and write files in that directory. Traefik will require an 0600 for the file. mountOptions: - dir_mode=0700 - file_mode=0600 - uid=65532 - gid=65532 parameters: skuName: Standard_LRS # no fancy requirements on this volume storageAccount: yourAzureStorageAccount # needs to match an existing Azure storage account
kubectl apply -f azure-acme-file.yaml
Then, in my traefik.yaml
persistence: enabled: true accessMode: ReadWriteMany size: 50Mi storageClass: "azure-acme-file" path: /data additionalArguments: - "--certificatesresolvers.cloudflare.acme.storage=/data/acme.json"
@cmenge, just tried to go in this way, but result is the same: on pod re-creation permissions were reset to 0660.
I have the same problem and doing what @jeff1985 mentioned above works for me, the downside is that every time that the pod goes down for some reason, I will need to delete the file and restart the pod.
Why enable persistence then?
Can confirm that this issue is also present when you define a storageclass for aws ebs volumes, and attach an ebs volume to store the acme.json on this mount.
When starting the container it gives the error:
Error: container has runAsNonRoot and image will run as root
When enabling the pod to run as root we either get the error:
"the router traefik-traefik-dashboard-bla@kubernetescrd uses a non-existent resolver: letsencrypt"
The ACME resolver \"letsencrypt\" is skipped from the resolvers list because: unable to get ACME account: permissions 660 for /certs/acme.json are too open, please use 600
When we use an init container to chmod 600 the /certs/acme.json, we get permission errors.
Nearly one year later I tried again with Traefik 2.4 this time.
time="2021-03-11T12:56:58Z" level=info msg="Configuration loaded from flags."
2021/03/11 12:56:58 traefik.go:76: command traefik error: error while building entryPoint web: error preparing server: error opening listener: listen tcp :80: bind: permission denied
time="2021-03-11T12:56:58Z" level=error msg="The ACME resolver \"le\" is skipped from the resolvers list because: unable to get ACME account: open /data/acme.json: permission denied"
So there is also a permission denied error for port 80 binding.
There is a ticket mentioned in the new values.yaml file but doing this also doesn't help:
initContainers:
# The "volume-permissions" init container is required if you run into permission issues.
# Related issue: https://github.com/traefik/traefik/issues/6972
- name: volume-permissions
image: busybox:1.31.1
command: ["sh", "-c", "chmod -Rv 600 /data/*"]
volumeMounts:
- name: data
mountPath: /data
When changing the security settings to run as root, then the init container will fail too.
chmod: /data/*: No such file or directory
# Set the container security context
# To run the container with ports below 1024 this will need to be adjust to run as root
securityContext:
capabilities:
drop: [ALL]
readOnlyRootFilesystem: true
runAsGroup: 65532
runAsNonRoot: false # <<< this one changed to false
runAsUser: 65532
podSecurityContext:
fsGroup: 65532
After that change it looks a bit differntly on startup:
kubectl -n traefik logs -f traefik-xxx-m5gpj volume-permissions ( doesn't exist)
mode of '/data/lost+found' changed to 0600 (rw-------)
So the new hack with initContainer works with file permissions. BUT still the same permission denied errors on /data/acme.json and port 80 binding.
I adjusted the init container so the result looks like this now:
-rw------- 1 65532 65532 0 Mar 11 13:22 acme.json
drw------- 2 root root 16.0K Mar 11 13:07 lost+found
My adjustment: (i create the acme.json file if it doesnt exist and set proper permissions)
initContainers:
# The "volume-permissions" init container is required if you run into permission issues.
# Related issue: https://github.com/traefik/traefik/issues/6972
- name: volume-permissions
image: busybox:1.31.1
command: ["sh", "-c", "touch /data/acme.json ; chown 65532:65532 /data/acme.json ; chmod -Rv 600 /data/*"]
volumeMounts:
- name: data
mountPath: /data
Finally the container starts without problem:
kubectl -n traefik logs -f traefik-xxx-kpcjs ( doesn't exist)
time="2021-03-11T13:22:14Z" level=info msg="Configuration loaded from flags."
^C
BUT: as you can see, the acme.json is empty. It doesn't get filled. Is it a problem of my cli arguments/wrong configuration?
additionalArguments:
- "--log.level=ERROR"
- "--ping=true"
- "--api=true"
#- "--entrypoints.web.address=:80"
- "--entrypoints.web.http.redirections.entryPoint.to=websecure"
- "--entrypoints.web.http.redirections.entryPoint.scheme=https"
- "--entrypoints.web.http.redirections.entrypoint.permanent=true"
#- "--entrypoints.websecure.address=:443"
- "--entryPoints.web.forwardedHeaders.insecure"
- "--entryPoints.websecure.forwardedHeaders.insecure"
- "--certificatesresolvers.le.acme.httpchallenge=true"
- "--certificatesresolvers.le.acme.httpchallenge.entrypoint=web"
#- "--certificatesresolvers.lale.caserver=https://acme-staging-v02.api.letsencrypt.org/directory"
- "--certificatesresolvers.le.acme.email=sag@ich.net"
- "--certificatesresolvers.le.acme.storage=/data/acme.json"
# Force browser to load HTTPS version of website with STS headers:
- "traefik.frontend.headers.forceSTSHeader=true"
- "traefik.frontend.headers.STSSeconds=315360000"
- "traefik.frontend.headers.STSIncludeSubdomains=true"
- "traefik.frontend.headers.STSPreload=true"
Hello Everyone,
Regarding the invalid permission, the solution is to use initContainers
that is added to the official Traefik Helm Chart. That feature has to be enabled manually because we believe that permission reconciliation on the existing filesystem should be enabled by purpose by the operator who is aware of what is happening on the filesystem.
The issue is related to the underlaying storage provider and might be related to the umask settings.
It is fixed by enabling initcontainers section in values.yaml https://github.com/traefik/traefik-helm-chart/blob/4fd1ea77f0a3444dafadc9247b01d7b732ca3828/traefik/values.yaml#L40
I have been struggling with this issue for a few days now, and something about the init-container in the manifest values.yaml
does not work with my setup.
I'm using k3s with traefik disabled at installed, and trying to am install traefik2 into a mostly fresh cluster. I have modified some of the values.yaml locally and am passing those values in the helm install command line.
helm install traefik traefik/traefik --namespace=kube-system --values=traefik-values.yaml
It breaks down here:
time="2021-04-20T22:35:21Z" level=info msg="Configuration loaded from flags."
time="2021-04-20T22:35:21Z" level=error msg="The ACME resolver \"cloudflare\" is skipped from the resolvers list because: unable to get ACME account: open /data/acme.json: permission denied"
Or
Result:
Not sure where to look for this init-container failing in this fashion.
@TheNakedZealot I can't figure out your problem, but I have the same infra (k3s, traefik, Cloudflare) and I can suggest you look at my setup here
I have a similar issue using Azure Container Instance and file share mounts. There isn't a way to adjust mount options in ACI, which means that as far as I understand it, there is no available workaround for the ACI+Azure File Share scenario.
I agree with previous commenters who requested this be downgraded to a warning, or configurable, or something. I am completely blocked from using LE because of this issue.
Nothing from the above mentioned solutions worked for me until I decided to uninstall traefik, update the helm charts, reinstall it and use the initContainer solution with a slight modification to change the ownership of acme.json to user.
initContainers:
- name: volume-permissions
image: busybox:1.31.1
command: ["sh", "-c", "touch /data/acme.json && chmod -Rv 600 /data/* && chown 65532:65532 /data/acme.json"]
volumeMounts:
- name: data
mountPath: /data
Hope this helps someone, as I've been banging my head on the desk for quite a while to get this working.
@stacklikemind thank you, you saved my day
PS: The official workaround didn't work for me: https://github.com/traefik/traefik-helm-chart/blob/ff25058/traefik/values.yaml#L46-L54
but the updated command from @stacklikemind is work
Hello,
Thanks @stacklikemind: I updated the official workaround.
I'm thinking about closing this issue, since on the helm chart side there is nothing more that we can do : we don't have ways to set UMASK on Kubernetes pod, currently.
Adding deployment - initContainers in my values.yaml gives me a strange error:
Defaulted container "traefik" out of: traefik, volume-permissions (init)
Has anyone come across how to get around with this issue? Thanks!
Adding deployment - initContainers in my values.yaml gives me a strange error:
Defaulted container "traefik" out of: traefik, volume-permissions (init)
Has anyone come across how to get around with this issue? Thanks!
Same here...
Adding deployment - initContainers in my values.yaml gives me a strange error:
Defaulted container "traefik" out of: traefik, volume-permissions (init)
Has anyone come across how to get around with this issue? Thanks!Same here...
Don't remember exactly how I fixed this problem, but I did and the problem was with one of the folowing:
chmod -R a+rwx /path
). Note: Although 777 is not the solution, but at least you get a different error message and then there's a progress...
Adding deployment - initContainers in my values.yaml gives me a strange error:
Defaulted container "traefik" out of: traefik, volume-permissions (init)
Has anyone come across how to get around with this issue? Thanks!
That's not an error; by adding an initContainer, you are adding a second container to the deployment/pod. When interacting with the pod you either have to specify which container you want to work with, or you let kubectl pick one for you, and this message is telling you that it just did that for you.
If the volume filesystem is ext4 the recursive chmod doesn't work, the init container will fail because of lost+found:
mode of '/data/acme.json' changed to 0600 (rw-------)
chmod: /data/lost+found: Operation not permitted
chmod: /data/lost+found: Operation not permitted
Also to make the init container work unprivileged (depends on cluster security rules) you need to set the securityContext. I eventually got it to work like this:
deployment:
initContainers:
- name: volume-permissions
image: busybox:latest
command: ["sh", "-c", "chmod -v 600 /data/acme.json"]
securityContext:
runAsGroup: 65532
runAsUser: 65532
volumeMounts:
- name: data
mountPath: /data
However before setting up the initContainer, I confirmed that after I manually did chmod 0600 to fix the problem, and then kubectl -n traefik rollout restart deployment/traefik
to restart the pod, somehow acme.json was getting changed back to 660. However I also tested deleting the volume and PVC, uninstalling the Helm chart and starting completely fresh, and acme.json was originally 600 but later changed to 660.
There is nothing as far as I can see in the helm chart or entrypoint script that would change the permissions of the acme.json file (unless you set an initContainer) so it must be the traefik binary changing it to 660 under certain conditions.
@jakubhajek even if it were a matter of umask settings, that comes from /etc/profile in the traefik image and should be fixed in the traefik image (why not put it in /entrypoint.sh ?)
But Traefik creates the acme.json file in the first place so it should simply create it with the right permissions. I don't see how it can be related to the underlying storage volume or umask. Maybe there is some golang quirk that is changing the file mode when it gets opened or something?
@rptaylor Does this work for you by unsetting fsGroup
on PodSecurityContext ?
--set podSecurityContext.fsGroup=null
I mean, does it remove for your setup the need to add an initContainer ?
Hello, i've tried this chart now. So far so good but when I watch at the logs I see the error from subject of this ticket:
How to fix it? I've installed traefik into kube-system namespace and have two nodes. That's why I assign it with nodeSelector to the "master"
My values.yaml looks like this: