traefik / traefik-helm-chart

Traefik Proxy Helm Chart
https://traefik.io
Apache License 2.0
1.09k stars 762 forks source link

The ACME resolver \"le\" is skipped from the resolvers list because: unable to get ACME account: permissions 660 for /data/acme.json are too open, please use 600 #164

Closed renepardon closed 2 years ago

renepardon commented 4 years ago

Hello, i've tried this chart now. So far so good but when I watch at the logs I see the error from subject of this ticket:

The ACME resolver \"le\" is skipped from the resolvers list because: unable to get ACME account: permissions 660 for /data/acme.json are too open, please use 600

How to fix it? I've installed traefik into kube-system namespace and have two nodes. That's why I assign it with nodeSelector to the "master"

My values.yaml looks like this:

```yaml # Default values for Traefik image: name: traefik tag: 2.2.0 # # Configure the deployment # deployment: enabled: true # Number of pods of the deployment replicas: 1 # Additional deployment annotations (e.g. for jaeger-operator sidecar injection) annotations: {} # Additional pod annotations (e.g. for mesh injection or prometheus scraping) podAnnotations: {} # Create an IngressRoute for the dashboard ingressRoute: dashboard: enabled: true # Additional ingressRoute annotations (e.g. for kubernetes.io/ingress.class) annotations: {} # Additional ingressRoute labels (e.g. for filtering IngressRoute by custom labels) labels: {} rollingUpdate: maxUnavailable: 1 maxSurge: 1 # # Add volumes to the traefik pod. # This can be used to mount a cert pair or a configmap that holds a config.toml file. # After the volume has been mounted, add the configs into traefik by using the `additionalArguments` list below, eg: # additionalArguments: # - "--providers.file.filename=/config/dynamic.toml" volumes: [] # - name: public-cert # mountPath: "/certs" # type: secret # - name: configs # mountPath: "/config" # type: configMap globalArguments: - "--global.checknewversion" - "--global.sendanonymoususage" # # Configure Traefik static configuration # Additional arguments to be passed at Traefik's binary # All available options available on https://docs.traefik.io/reference/static-configuration/cli/ ## Use curly braces to pass values: `helm install --set="additionalArguments={--providers.kubernetesingress,--global.checknewversion=true}"` additionalArguments: - "--certificatesresolvers.le.acme.storage=/data/acme.json" # - "--providers.kubernetesingress" # Environment variables to be passed to Traefik's binary env: [] # - name: SOME_VAR # value: some-var-value # - name: SOME_VAR_FROM_CONFIG_MAP # valueFrom: # configMapRef: # name: configmap-name # key: config-key # - name: SOME_SECRET # valueFrom: # secretKeyRef: # name: secret-name # key: secret-key envFrom: [] # - configMapRef: # name: config-map-name # - secretRef: # name: secret-name # Configure ports ports: # The name of this one can't be changed as it is used for the readiness and # liveness probes, but you can adjust its config to your liking traefik: port: 9000 # Use hostPort if set. # hostPort: 9000 # Defines whether the port is exposed if service.type is LoadBalancer or # NodePort. # # You SHOULD NOT expose the traefik port on production deployments. # If you want to access it from outside of your cluster, # use `kubectl proxy` or create a secure ingress expose: false # The exposed port for this service exposedPort: 9000 web: port: 8000 # hostPort: 8000 expose: true exposedPort: 80 # Use nodeport if set. This is useful if you have configured Traefik in a # LoadBalancer # nodePort: 32080 websecure: port: 8443 # hostPort: 8443 expose: true exposedPort: 443 # nodePort: 32443 # Options for the main traefik service, where the entrypoints traffic comes # from. service: enabled: true type: LoadBalancer # Additional annotations (e.g. for cloud provider specific config) annotations: {} # Additional entries here will be added to the service spec. Cannot contains # type, selector or ports entries. spec: {} # externalTrafficPolicy: Cluster # loadBalancerIP: "1.2.3.4" # clusterIP: "2.3.4.5" loadBalancerSourceRanges: [] # - 192.168.0.1/32 # - 172.16.0.0/16 externalIPs: [] # - 1.2.3.4 ## Create HorizontalPodAutoscaler object. ## autoscaling: enabled: false # minReplicas: 1 # maxReplicas: 10 # metrics: # - type: Resource # resource: # name: cpu # targetAverageUtilization: 60 # - type: Resource # resource: # name: memory # targetAverageUtilization: 60 # Enable persistence using Persistent Volume Claims # ref: http://kubernetes.io/docs/user-guide/persistent-volumes/ # After the pvc has been mounted, add the configs into traefik by using the `additionalArguments` list below, eg: # additionalArguments: # - "--certificatesresolvers.le.acme.storage=/data/acme.json" # It will persist TLS certificates. persistence: enabled: true accessMode: ReadWriteOnce size: 128Mi storageClass: "csi-cinder-classic" path: /data annotations: {} # subPath: "" # only mount a subpath of the Volume into the pod # If hostNetwork is true, runs traefik in the host network namespace # To prevent unschedulabel pods due to port collisions, if hostNetwork=true # and replicas>1, a pod anti-affinity is recommended and will be set if the # affinity is left as default. hostNetwork: false # Additional serviceAccount annotations (e.g. for oidc authentication) serviceAccountAnnotations: {} resources: {} # requests: # cpu: "100m" # memory: "50Mi" # limits: # cpu: "300m" # memory: "150Mi" affinity: {} # # This example pod anti-affinity forces the scheduler to put traefik pods # # on nodes where no other traefik pods are scheduled. # # It should be used when hostNetwork: true to prevent port conflicts # podAntiAffinity: # requiredDuringSchedulingIgnoredDuringExecution: # - labelSelector: # matchExpressions: # - key: app # operator: In # values: # - {{ template "traefik.name" . }} # topologyKey: failure-domain.beta.kubernetes.io/zone nodeSelector: kubernetes.io/hostname: "somecustomname-master" tolerations: [] # Pods can have priority. # Priority indicates the importance of a Pod relative to other Pods. priorityClassName: "" # Set the container security context # To run the container with ports below 1024 this will need to be adjust to run as root securityContext: capabilities: drop: [ALL] readOnlyRootFilesystem: true runAsGroup: 65532 runAsNonRoot: true runAsUser: 65532 podSecurityContext: fsGroup: 65532 ```
Jawastew commented 4 years ago

As a quick fix: have you tried running chmod g-rw /data/acme.json in the pod?

renepardon commented 4 years ago

Hello @Jawastew no, didn't tried yet. I switched back to traefik/stable https://github.com/helm/charts/tree/master/stable/traefik

SimonTheLeg commented 4 years ago

https://github.com/containous/traefik-helm-chart/issues/164#issuecomment-620588820 I tried that. Problem is that when the volume is mounted by a new pod, it's mounted with 660 permissions again...

❯ kc exec -it traefik-69fc795fd-vg2qj /bin/sh
/ $ ls -alh /data/acme.json
-rw-rw----    1 65532    65532          0 Apr 29 07:21 /data/acme.json
/ $ echo "test" > /data/acme.json
/ $ cat /data/acme.json
test
/ $ chmod g-rw /data/acme.json
/ $ ls -alh /data/acme.json
-rw-------    1 65532    65532          5 Apr 29 08:38 /data/acme.json
/ $ exit

❯ kc delete pod traefik-69fc795fd-vg2qj
pod "traefik-69fc795fd-vg2qj" deleted

❯ kc exec -it traefik-69fc795fd-7mgjf /bin/sh
/ $ ls -alh /data/acme.json
-rw-rw----    1 65532    65532          5 Apr 29 08:38 /data/acme.json
/ $ cat /data/acme.json
test
aroq commented 4 years ago

The same issue on my side with acme.json permissions reset to 660 on each mount of a GKE persistent volume.

steviecash commented 4 years ago

Ran into the same problem in a GKE cluster. Turns out it has to do with the pod security context. Specifically those lines in the traefik pod yaml:

securityContext:
  fsGroup: 65532

When I removed those lines from the manifest, I was able to chmod the acme.json back to 600 and it stayed that way throughout multiple restarts.

I added the following to my values.yaml to remove the security context:

# Fix for acme.json file being changed to 660 from 600
podSecurityContext:
  fsGroup: null

I have seen the same kind of behaviour sometimes (can't remember how to reproduce it) on deployments that were on baremetal. In those cases, I was just deleting the acme.json, restart the Traefik pod and the access would persist as 600.

I am still missing something as this behavior always happens on GKE (with the native GKE storage class) but sometimes happens on baremetal (with a NFS storage class).

Hopefully this can unblock some people until we figure out what the real problem is.

sashasimkin commented 4 years ago

Getting the same behavior on EKS, and it seems like this is how fsGroup is designed to work in k8s.

I think that fsGroup should be removed from the default values unless there's a specific use-case for that.

v-braun commented 4 years ago

Is it possible to change permissions 660 for /data/acme.json are too open, please use 600 to a warning?

In Azure Kubernetes Services (AKS) it is a common problem that disks are mounted with unexpected permissions. If you use file storage as a volume claim, wich is much cheaper, you can not even change the permissions of the mounted volume.

sashasimkin commented 4 years ago

I just found out that deleting fsGroup from the template will mount the directory as root, which doesn't let traefik's process write to the mounted directory.

It seems like the only way to be certain here is to use something like as a workaround:

initContainers:
- name: take-data-dir-ownership
  image: alpine:3.6
  command:
  - chown
  - -R  
  - 600:600
  - /data
  volumeMounts:
  - name: data
    mountPath: /data

But maybe there's a way to control permissions with which k8s mounts volumes?

sashasimkin commented 4 years ago

Somehow this does the trick entirely, and file is created with 0600 and later mounted with 0600:

persistence:
  enabled: true
  accessMode: ReadWriteOnce
  size: 128Mi
  # storageClass: ""
  path: /data
  annotations: {
    "pv.beta.kubernetes.io/gid": "65532"
  }

podSecurityContext: null
fbonalair commented 4 years ago

I have the same issue, skipped resolvers because the saved certificate's permission are 660 instead of 600. Nor the annotation trick or chmod the file works for me. My temporary workaround is to first disabled persistence, everything works, then update helm chart with persistence enabled. Though it will break on next pod delete / restart.

Helm chart traefik-8.1.4, on a bare metal k3s kubernetes with longhorn as storage provider.

aodj commented 4 years ago

Somehow this does the trick entirely, and file is created with 0600 and later mounted with 0600:

persistence:
  enabled: true
  accessMode: ReadWriteOnce
  size: 128Mi
  # storageClass: ""
  path: /data
  annotations: {
    "pv.beta.kubernetes.io/gid": "65532"
  }

podSecurityContext: null

Doesn't seem to do the trick here; still getting the same error message.

mojochao commented 4 years ago

I'm also seeing the same behavior running traefik v2.2.1 on EKS (v1.16). Neither annotation of PVC with fsGroup nor setting podSecurityContext to null worked for me.

jeff1985 commented 4 years ago

On my cluster running on AKS I had to login to traefik pod and delete the acme.json (and restart the pod) to make it work.

davilag commented 4 years ago

I have the same problem and doing what @jeff1985 mentioned above works for me, the downside is that every time that the pod goes down for some reason, I will need to delete the file and restart the pod.

sashasimkin commented 4 years ago

What I actually been doing is:

  1. apply the chart with the following
    persistence:
    enabled: true
    accessMode: ReadWriteOnce
    size: 128Mi
    # storageClass: ""
    path: /data
    annotations: {
    "pv.beta.kubernetes.io/gid": "65532"
    }
  2. See the PV mounted and acme.json created
  3. chmod 0600 at acme.json
  4. add podSecurityContext: null to the values and apply new chart

idk why, but doing it like that kept proper permissions on acme.json after pod recreation (i.e. kubectl delete pod {current traefik pod})

davilag commented 4 years ago

@sashasimkin that unfortunately didn't work for me 😢

pablocaceresz commented 4 years ago

@davilag this work for me!

persistence:
  enabled: true
  size: 1Gi
  annotations: {
    "pv.beta.kubernetes.io/gid": "65532"
  }
podSecurityContext:
  fsGroup: 65532

try and let me know any questions.

Mhs-220 commented 4 years ago

None of the solutions work for me except for initContainers.

ch9hn commented 4 years ago

Hello, i have the same issue, following works for me:

If Traefik is running execute:

kubectl exec deploy/traefik -n default -- chmod 600 /data/acme.json Edit these lines:

persistence: enabled: true size: 1G path: /data annotations: { "pv.beta.kubernetes.io/gid": "65532" }

podSecurityContext: null

then upgrade the helm chart with the values:

helm upgrade traefik traefik/traefik --values traefik-values.yml

Addional Informations: Provider: Scaleway Elements Kapsule Storage: Scaleway Elements Block Storage SSD, storageClass: default Kubernetes Version 1.18.5 Traefik Version: 2.2.1

woodcockjosh commented 4 years ago

I think checking the permissions of the acme.json file should be a feature in traefik that you can disable. This creates so many problems.

yra-wtag commented 4 years ago

I think a quickfix should be given for this issue until a fix is found. Very annoying problem.

talex-de commented 4 years ago

Version 8.9.2 of this Traefik Helm chart works for me, v8.10.0 and 8.11.0 both have this issue with the "permissions 660" error message. Couldn't fix this behavior using init containers or different security context group. I'm using a bare metal k8s.

aodj commented 4 years ago

Still running into this issue. Now using chart version 9.1.0

fearoffish commented 4 years ago

It turns out, if you look in the values.yaml file they commented specifically about this issue.

https://github.com/containous/traefik-helm-chart/blob/401c8cdf690cbbc765a935c7279566a13b79a082/traefik/values.yaml#L22

riker09 commented 4 years ago

Has anybody a working configuration with init containers that fixes this for good? I ran into this on my AKS cluster and after playing around with the solutions in this issue here I eventually came up with a working state, but I'm afraid to update the release because in the past that resulted in a broken state again.

aodj commented 4 years ago

I just added the initContainer setup that is mentioned in the Helm chart values.yaml, as indicated by @fearoffish and it looks to be working fine.

riker09 commented 4 years ago

I confirm that after adding the volume-permissions to my values I have not suffered from permission issues again.

Here's my added configuration:


deployment:
  initContainers:
    # The "volume-permissions" init container is required if you run into permission issues.
    # Related issue: https://github.com/containous/traefik/issues/6972
    - name: volume-permissions
      image: busybox:1.31.1
      command: ["sh", "-c", "chmod -Rv 600 /data/*"]
      volumeMounts:
        - name: data
          mountPath: /data

[EDIT (much later)] I recently had to deploy Traefik in a bare-metal cluster and ran into the exact same permission issue and can once again CONFIRM that adding this Init Container configuration does work. The trick is to change everything inside the folder, not the folder itself.

DirkWolthuis commented 4 years ago

Edit2: seems to work. My bad! Edit: Ok I forgot to make the volume persistent. I did that, now I will check with a new domain if there are no more errors.

Anyone running into the problem with the initContainers? If I use the values in the comments, the init container can't complete because it can't find the /data directory in the other container? The volume-permissions containers logs say: chmod: /data/*: No such file or directory. Any advice?

full copy of values.yml

# Default values for Traefik
image:
  name: traefik
  tag: 2.3.1
  pullPolicy: IfNotPresent

#
# Configure the deployment
#
deployment:
  enabled: true
  # Number of pods of the deployment
  replicas: 1
  # Additional deployment annotations (e.g. for jaeger-operator sidecar injection)
  annotations: {}
  # Additional pod annotations (e.g. for mesh injection or prometheus scraping)
  podAnnotations: {}
  # Additional containers (e.g. for metric offloading sidecars)
  additionalContainers:
    []
    # https://docs.datadoghq.com/developers/dogstatsd/unix_socket/?tab=host
    # - name: socat-proxy
    # image: alpine/socat:1.0.5
    # args: ["-s", "-u", "udp-recv:8125", "unix-sendto:/socket/socket"]
    # volumeMounts:
    #   - name: dsdsocket
    #     mountPath: /socket
  # Additional volumes available for use with initContainers and additionalContainers
  additionalVolumes:
    []
    # - name: dsdsocket
    #   hostPath:
    #   path: /var/run/statsd-exporter
  # Additional initContainers (e.g. for setting file permission as shown below)
  initContainers:
    - name: volume-permissions
      image: busybox:1.31.1
      command: ["sh", "-c", "chmod -Rv 600 /data/*"]
      volumeMounts:
        - name: data
          mountPath: /data
  # The "volume-permissions" init container is required if you run into permission issues.
  # Related issue: https://github.com/traefik/traefik/issues/6972
  # Custom pod DNS policy. Apply if `hostNetwork: true`
  # dnsPolicy: ClusterFirstWithHostNet

# Pod disruption budget
podDisruptionBudget:
  enabled: false
  # maxUnavailable: 1
  # minAvailable: 0

# Use ingressClass. Ignored if Traefik version < 2.3 / kubernetes < 1.18.x
ingressClass:
  # true is not unit-testable yet, pending https://github.com/rancher/helm-unittest/pull/12
  enabled: false
  isDefaultClass: false

# Activate Pilot integration
pilot:
  enabled: false
  token: ""

# Enable experimental features
experimental:
  plugins:
    enabled: false

# Create an IngressRoute for the dashboard
ingressRoute:
  dashboard:
    enabled: true
    # Additional ingressRoute annotations (e.g. for kubernetes.io/ingress.class)
    annotations: {}
    # Additional ingressRoute labels (e.g. for filtering IngressRoute by custom labels)
    labels: {}

rollingUpdate:
  maxUnavailable: 1
  maxSurge: 1

#
# Configure providers
#
providers:
  kubernetesCRD:
    enabled: true
  kubernetesIngress:
    enabled: true
    # IP used for Kubernetes Ingress endpoints
    publishedService:
      enabled: false
      # Published Kubernetes Service to copy status from. Format: namespace/servicename
      # By default this Traefik service
      # pathOverride: ""

#
# Add volumes to the traefik pod. The volume name will be passed to tpl.
# This can be used to mount a cert pair or a configmap that holds a config.toml file.
# After the volume has been mounted, add the configs into traefik by using the `additionalArguments` list below, eg:
# additionalArguments:
# - "--providers.file.filename=/config/dynamic.toml"
volumes: []
# - name: public-cert
#   mountPath: "/certs"
#   type: secret
# - name: '{{ printf "%s-configs" .Release.Name }}'
#   mountPath: "/config"
#   type: configMap

# Logs
# https://docs.traefik.io/observability/logs/
logs:
  # Traefik logs concern everything that happens to Traefik itself (startup, configuration, events, shutdown, and so on).
  general:
    # By default, the logs use a text format (common), but you can
    # also ask for the json format in the format option
    # format: json
    # By default, the level is set to ERROR. Alternative logging levels are DEBUG, PANIC, FATAL, ERROR, WARN, and INFO.
    level: ERROR
  access:
    # To enable access logs
    enabled: false
    # By default, logs are written using the Common Log Format (CLF).
    # To write logs in JSON, use json in the format option.
    # If the given format is unsupported, the default (CLF) is used instead.
    # format: json
    # To write the logs in an asynchronous fashion, specify a bufferingSize option.
    # This option represents the number of log lines Traefik will keep in memory before writing
    # them to the selected output. In some cases, this option can greatly help performances.
    # bufferingSize: 100
    # Filtering https://docs.traefik.io/observability/access-logs/#filtering
    filters:
      {}
      # statuscodes: "200,300-302"
      # retryattempts: true
      # minduration: 10ms
    # Fields
    # https://docs.traefik.io/observability/access-logs/#limiting-the-fieldsincluding-headers
    fields:
      general:
        defaultmode: keep
        names:
          {}
          # Examples:
          # ClientUsername: drop
      headers:
        defaultmode: drop
        names:
          {}
          # Examples:
          # User-Agent: redact
          # Authorization: drop
          # Content-Type: keep

globalArguments:
  - "--global.checknewversion"
  - "--global.sendanonymoususage"

#
# Configure Traefik static configuration
# Additional arguments to be passed at Traefik's binary
# All available options available on https://docs.traefik.io/reference/static-configuration/cli/
## Use curly braces to pass values: `helm install --set="additionalArguments={--providers.kubernetesingress.ingressclass=traefik-internal,--log.level=DEBUG}"`
additionalArguments:
  - "--certificatesresolvers.letsencrypt.acme.email=letsencrypt@ikbendirk.nl"
  - "--certificatesresolvers.letsencrypt.acme.storage=/data/acme.json"
  #- "--certificatesresolvers.letsencrypt.acme.caserver=https://acme-v02.api.letsencrypt.org/directory"
  - "--certificatesresolvers.letsencrypt.acme.caserver=https://acme-staging-v02.api.letsencrypt.org/directory"
  - "--certificatesResolvers.letsencrypt.acme.tlschallenge=true"
  - "--api.insecure=true"
  - "--accesslog=true"
  - "--log.level=INFO"
#  - "--providers.kubernetesingress.ingressclass=traefik-internal"
#  - "--log.level=DEBUG"

# Environment variables to be passed to Traefik's binary
env: []
# - name: SOME_VAR
#   value: some-var-value
# - name: SOME_VAR_FROM_CONFIG_MAP
#   valueFrom:
#     configMapRef:
#       name: configmap-name
#       key: config-key
# - name: SOME_SECRET
#   valueFrom:
#     secretKeyRef:
#       name: secret-name
#       key: secret-key

envFrom: []
# - configMapRef:
#     name: config-map-name
# - secretRef:
#     name: secret-name

# Configure ports
ports:
  # The name of this one can't be changed as it is used for the readiness and
  # liveness probes, but you can adjust its config to your liking
  traefik:
    port: 9000
    # Use hostPort if set.
    # hostPort: 9000
    #
    # Use hostIP if set. If not set, Kubernetes will default to 0.0.0.0, which
    # means it's listening on all your interfaces and all your IPs. You may want
    # to set this value if you need traefik to listen on specific interface
    # only.
    # hostIP: 192.168.100.10

    # Defines whether the port is exposed if service.type is LoadBalancer or
    # NodePort.
    #
    # You SHOULD NOT expose the traefik port on production deployments.
    # If you want to access it from outside of your cluster,
    # use `kubectl port-forward` or create a secure ingress
    expose: false
    # The exposed port for this service
    exposedPort: 9000
    # The port protocol (TCP/UDP)
    protocol: TCP
  web:
    port: 8000
    # hostPort: 8000
    expose: true
    exposedPort: 80
    # The port protocol (TCP/UDP)
    protocol: TCP
    # Use nodeport if set. This is useful if you have configured Traefik in a
    # LoadBalancer
    # nodePort: 32080
    # Port Redirections
    # Added in 2.2, you can make permanent redirects via entrypoints.
    # https://docs.traefik.io/routing/entrypoints/#redirection
    # redirectTo: websecure
  websecure:
    port: 8443
    # hostPort: 8443
    expose: true
    exposedPort: 443
    # The port protocol (TCP/UDP)
    protocol: TCP
    # nodePort: 32443
    # Set TLS at the entrypoint
    # https://doc.traefik.io/traefik/routing/entrypoints/#tls
    tls:
      enabled: false
      # this is the name of a TLSOption definition
      options: ""
      certResolver: ""
      domains: []
      # - main: example.com
      #   sans:
      #     - foo.example.com
      #     - bar.example.com

# TLS Options are created as TLSOption CRDs
# https://doc.traefik.io/traefik/https/tls/#tls-options
# Example:
# tlsOptions:
#   default:
#     sniStrict: true
#     preferServerCipherSuites: true
#   foobar:
#     curvePreferences:
#       - CurveP521
#       - CurveP384
tlsOptions: {}

# Options for the main traefik service, where the entrypoints traffic comes
# from.
service:
  enabled: true
  type: LoadBalancer
  # Additional annotations (e.g. for cloud provider specific config)
  annotations: {}
  # Additional service labels (e.g. for filtering Service by custom labels)
  labels: {}
  # Additional entries here will be added to the service spec. Cannot contains
  # type, selector or ports entries.
  spec:
    {}
    # externalTrafficPolicy: Cluster
    # loadBalancerIP: "1.2.3.4"
    # clusterIP: "2.3.4.5"
  loadBalancerSourceRanges:
    []
    # - 192.168.0.1/32
    # - 172.16.0.0/16
  externalIPs:
    []
    # - 1.2.3.4

## Create HorizontalPodAutoscaler object.
##
autoscaling:
  enabled: false
#   minReplicas: 1
#   maxReplicas: 10
#   metrics:
#   - type: Resource
#     resource:
#       name: cpu
#       targetAverageUtilization: 60
#   - type: Resource
#     resource:
#       name: memory
#       targetAverageUtilization: 60

# Enable persistence using Persistent Volume Claims
# ref: http://kubernetes.io/docs/user-guide/persistent-volumes/
# After the pvc has been mounted, add the configs into traefik by using the `additionalArguments` list below, eg:
# additionalArguments:
# - "--certificatesresolvers.le.acme.storage=/data/acme.json"
# It will persist TLS certificates.
persistence:
  enabled: false

  #  existingClaim: ""
  accessMode: ReadWriteOnce
  size: 128Mi
  # storageClass: ""
  path: /data
  annotations: {}
  # subPath: "" # only mount a subpath of the Volume into the pod

# If hostNetwork is true, runs traefik in the host network namespace
# To prevent unschedulabel pods due to port collisions, if hostNetwork=true
# and replicas>1, a pod anti-affinity is recommended and will be set if the
# affinity is left as default.
hostNetwork: false

# Whether Role Based Access Control objects like roles and rolebindings should be created
rbac:
  enabled: true

  # If set to false, installs ClusterRole and ClusterRoleBinding so Traefik can be used across namespaces.
  # If set to true, installs namespace-specific Role and RoleBinding and requires provider configuration be set to that same namespace
  namespaced: false

# Enable to create a PodSecurityPolicy and assign it to the Service Account via RoleBindin or ClusterRoleBinding
podSecurityPolicy:
  enabled: false

# The service account the pods will use to interact with the Kubernetes API
serviceAccount:
  # If set, an existing service account is used
  # If not set, a service account is created automatically using the fullname template
  name: ""

# Additional serviceAccount annotations (e.g. for oidc authentication)
serviceAccountAnnotations: {}

resources:
  {}
  # requests:
  #   cpu: "100m"
  #   memory: "50Mi"
  # limits:
  #   cpu: "300m"
  #   memory: "150Mi"
affinity: {}
# # This example pod anti-affinity forces the scheduler to put traefik pods
# # on nodes where no other traefik pods are scheduled.
# # It should be used when hostNetwork: true to prevent port conflicts
#   podAntiAffinity:
#     requiredDuringSchedulingIgnoredDuringExecution:
#     - labelSelector:
#         matchExpressions:
#         - key: app
#           operator: In
#           values:
#           - {{ template "traefik.name" . }}
#       topologyKey: failure-domain.beta.kubernetes.io/zone
nodeSelector: {}
tolerations: []

# Pods can have priority.
# Priority indicates the importance of a Pod relative to other Pods.
priorityClassName: ""

# Set the container security context
# To run the container with ports below 1024 this will need to be adjust to run as root
securityContext:
  capabilities:
    drop: [ALL]
  readOnlyRootFilesystem: true
  runAsGroup: 65532
  runAsNonRoot: true
  runAsUser: 65532

podSecurityContext:
  fsGroup: 65532
woodcockjosh commented 4 years ago

@DirkWolthuis I used find /data/ -exec chmod 600 {} \;

yriveiro commented 4 years ago

Using preemptive nodes in GKE with the init container approach still not working properlly.

Deploying for the first time it works and keeps /data/* with the correct permissions but if the container migrates to other node (because the original node was force to preempt) the permissions are lost.

lexfrei commented 4 years ago

I'm facing this issue with longhorn storage. @SantoDE, can you mark this issue as a bug?

jeffywu commented 3 years ago

Very hacky but what worked for me was log into the pod kubectl exec -it <podname> /bin/sh update the permissions on the acme.json file chmod g-rw /certs/acme.json and then restart the docker container (not the pod): kubectl exec POD_NAME -c traefik /sbin/reboot

This is on GKE:

❯ kubectl version
Client Version: version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.4", GitCommit:"d360454c9bcd1634cf4cc52d1867af5491dc9c5f", GitTreeState:"clean", BuildDate:"2020-11-11T13:17:17Z", GoVersion:"go1.15.2", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"17+", GitVersion:"v1.17.13-gke.2001", GitCommit:"00c919adfe4adf308bcd7c02838f2a1b60482f02", GitTreeState:"clean", BuildDate:"2020-11-06T18:24:02Z", GoVersion:"go1.13.15b4", Compiler:"gc", Platform:"linux/amd64"}
cmenge commented 3 years ago

Not sure if this helps, but most of the tips above didn't work for me (including the initContainer). I ended up writing a StorageClass which seems to work well (using AKS).

i.e., first, create a StorageClass azure-acme-file.yaml:

kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
  name: azure-acme-file
# Use Azure file, not block storage: in an error scenario, this will be accessed by multiple virtual machines at the same time, thus requiring multi-attach capabilities (ReadWriteMany)
provisioner: kubernetes.io/azure-file
# Allow our user 65532 to read, write and -important- execute (i.e. access / list) the directory,
# and read and write files in that directory. Traefik will require an 0600 for the file.
mountOptions:
  - dir_mode=0700
  - file_mode=0600
  - uid=65532
  - gid=65532
parameters:
  skuName: Standard_LRS # no fancy requirements on this volume
  storageAccount: yourAzureStorageAccount  # needs to match an existing Azure storage account

kubectl apply -f azure-acme-file.yaml

Then, in my traefik.yaml

persistence:
  enabled: true
  accessMode: ReadWriteMany
  size: 50Mi
  storageClass: "azure-acme-file"
  path: /data
  additionalArguments:
    - "--certificatesresolvers.cloudflare.acme.storage=/data/acme.json"
pa-yourserveradmin-com commented 3 years ago

Not sure if this helps, but most of the tips above didn't work for me (including the initContainer). I ended up writing a StorageClass which seems to work well (using AKS).

i.e., first, create a StorageClass azure-acme-file.yaml:

kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
  name: azure-acme-file
# Use Azure file, not block storage: in an error scenario, this will be accessed by multiple virtual machines at the same time, thus requiring multi-attach capabilities (ReadWriteMany)
provisioner: kubernetes.io/azure-file
# Allow our user 65532 to read, write and -important- execute (i.e. access / list) the directory,
# and read and write files in that directory. Traefik will require an 0600 for the file.
mountOptions:
  - dir_mode=0700
  - file_mode=0600
  - uid=65532
  - gid=65532
parameters:
  skuName: Standard_LRS # no fancy requirements on this volume
  storageAccount: yourAzureStorageAccount  # needs to match an existing Azure storage account

kubectl apply -f azure-acme-file.yaml

Then, in my traefik.yaml

persistence:
  enabled: true
  accessMode: ReadWriteMany
  size: 50Mi
  storageClass: "azure-acme-file"
  path: /data
  additionalArguments:
    - "--certificatesresolvers.cloudflare.acme.storage=/data/acme.json"

@cmenge, just tried to go in this way, but result is the same: on pod re-creation permissions were reset to 0660.

a-nldisr commented 3 years ago

I have the same problem and doing what @jeff1985 mentioned above works for me, the downside is that every time that the pod goes down for some reason, I will need to delete the file and restart the pod.

Why enable persistence then?

Can confirm that this issue is also present when you define a storageclass for aws ebs volumes, and attach an ebs volume to store the acme.json on this mount. When starting the container it gives the error: Error: container has runAsNonRoot and image will run as root When enabling the pod to run as root we either get the error: "the router traefik-traefik-dashboard-bla@kubernetescrd uses a non-existent resolver: letsencrypt"

The ACME resolver \"letsencrypt\" is skipped from the resolvers list because: unable to get ACME account: permissions 660 for /certs/acme.json are too open, please use 600

When we use an init container to chmod 600 the /certs/acme.json, we get permission errors.

renepardon commented 3 years ago

Nearly one year later I tried again with Traefik 2.4 this time.

time="2021-03-11T12:56:58Z" level=info msg="Configuration loaded from flags."
2021/03/11 12:56:58 traefik.go:76: command traefik error: error while building entryPoint web: error preparing server: error opening listener: listen tcp :80: bind: permission denied
time="2021-03-11T12:56:58Z" level=error msg="The ACME resolver \"le\" is skipped from the resolvers list because: unable to get ACME account: open /data/acme.json: permission denied"

So there is also a permission denied error for port 80 binding.

There is a ticket mentioned in the new values.yaml file but doing this also doesn't help:

  initContainers:
    # The "volume-permissions" init container is required if you run into permission issues.
    # Related issue: https://github.com/traefik/traefik/issues/6972
  - name: volume-permissions
    image: busybox:1.31.1
    command: ["sh", "-c", "chmod -Rv 600 /data/*"]
    volumeMounts:
      - name: data
        mountPath: /data

When changing the security settings to run as root, then the init container will fail too.

chmod: /data/*: No such file or directory
# Set the container security context
# To run the container with ports below 1024 this will need to be adjust to run as root
securityContext:
  capabilities:
    drop: [ALL]
  readOnlyRootFilesystem: true
  runAsGroup: 65532
  runAsNonRoot: false # <<< this one changed to false
  runAsUser: 65532

podSecurityContext:
  fsGroup: 65532

After that change it looks a bit differntly on startup:

kubectl -n traefik logs -f traefik-xxx-m5gpj volume-permissions                                       ( doesn't exist)
mode of '/data/lost+found' changed to 0600 (rw-------)

So the new hack with initContainer works with file permissions. BUT still the same permission denied errors on /data/acme.json and port 80 binding.

renepardon commented 3 years ago

I adjusted the init container so the result looks like this now:

-rw-------    1 65532    65532          0 Mar 11 13:22 acme.json
drw-------    2 root     root       16.0K Mar 11 13:07 lost+found

My adjustment: (i create the acme.json file if it doesnt exist and set proper permissions)

  initContainers:
    # The "volume-permissions" init container is required if you run into permission issues.
    # Related issue: https://github.com/traefik/traefik/issues/6972
  - name: volume-permissions
    image: busybox:1.31.1
    command: ["sh", "-c", "touch /data/acme.json ; chown 65532:65532 /data/acme.json ; chmod -Rv 600 /data/*"]
    volumeMounts:
      - name: data
        mountPath: /data

Finally the container starts without problem:

kubectl -n traefik logs -f traefik-xxx-kpcjs                                                          ( doesn't exist)
time="2021-03-11T13:22:14Z" level=info msg="Configuration loaded from flags."
^C

BUT: as you can see, the acme.json is empty. It doesn't get filled. Is it a problem of my cli arguments/wrong configuration?

additionalArguments:
- "--log.level=ERROR"
- "--ping=true"
- "--api=true"
#- "--entrypoints.web.address=:80"
- "--entrypoints.web.http.redirections.entryPoint.to=websecure"
- "--entrypoints.web.http.redirections.entryPoint.scheme=https"
- "--entrypoints.web.http.redirections.entrypoint.permanent=true"
#- "--entrypoints.websecure.address=:443"
- "--entryPoints.web.forwardedHeaders.insecure"
- "--entryPoints.websecure.forwardedHeaders.insecure"
- "--certificatesresolvers.le.acme.httpchallenge=true"
- "--certificatesresolvers.le.acme.httpchallenge.entrypoint=web"
#- "--certificatesresolvers.lale.caserver=https://acme-staging-v02.api.letsencrypt.org/directory"
- "--certificatesresolvers.le.acme.email=sag@ich.net"
- "--certificatesresolvers.le.acme.storage=/data/acme.json"
# Force browser to load HTTPS version of website with STS headers:
- "traefik.frontend.headers.forceSTSHeader=true"
- "traefik.frontend.headers.STSSeconds=315360000"
- "traefik.frontend.headers.STSIncludeSubdomains=true"
- "traefik.frontend.headers.STSPreload=true"
jakubhajek commented 3 years ago

Hello Everyone,

Regarding the invalid permission, the solution is to use initContainers that is added to the official Traefik Helm Chart. That feature has to be enabled manually because we believe that permission reconciliation on the existing filesystem should be enabled by purpose by the operator who is aware of what is happening on the filesystem. The issue is related to the underlaying storage provider and might be related to the umask settings.

It is fixed by enabling initcontainers section in values.yaml https://github.com/traefik/traefik-helm-chart/blob/4fd1ea77f0a3444dafadc9247b01d7b732ca3828/traefik/values.yaml#L40

shrinedogg commented 3 years ago

I have been struggling with this issue for a few days now, and something about the init-container in the manifest values.yaml does not work with my setup.

I'm using k3s with traefik disabled at installed, and trying to am install traefik2 into a mostly fresh cluster. I have modified some of the values.yaml locally and am passing those values in the helm install command line.

helm install traefik traefik/traefik --namespace=kube-system --values=traefik-values.yaml

It breaks down here:

  1. Either I comment out the init-container and the pod comes up without permissions...
time="2021-04-20T22:35:21Z" level=info msg="Configuration loaded from flags."
time="2021-04-20T22:35:21Z" level=error msg="The ACME resolver \"cloudflare\" is skipped from the resolvers list because: unable to get ACME account: open /data/acme.json: permission denied"

Or

  1. I uncomment the init-container, and the pod get stuck initializing on due to a failed init-container.

image

Result: image

Not sure where to look for this init-container failing in this fashion.

lexfrei commented 3 years ago

@TheNakedZealot I can't figure out your problem, but I have the same infra (k3s, traefik, Cloudflare) and I can suggest you look at my setup here

aggieben commented 2 years ago

I have a similar issue using Azure Container Instance and file share mounts. There isn't a way to adjust mount options in ACI, which means that as far as I understand it, there is no available workaround for the ACI+Azure File Share scenario.

I agree with previous commenters who requested this be downgraded to a warning, or configurable, or something. I am completely blocked from using LE because of this issue.

stacklikemind commented 2 years ago

Nothing from the above mentioned solutions worked for me until I decided to uninstall traefik, update the helm charts, reinstall it and use the initContainer solution with a slight modification to change the ownership of acme.json to user.

 initContainers:
    - name: volume-permissions
      image: busybox:1.31.1
      command: ["sh", "-c", "touch /data/acme.json && chmod -Rv 600 /data/* && chown 65532:65532 /data/acme.json"]
      volumeMounts:
        - name: data
          mountPath: /data

Hope this helps someone, as I've been banging my head on the desk for quite a while to get this working.

mgerasimchuk commented 2 years ago

@stacklikemind thank you, you saved my day

PS: The official workaround didn't work for me: https://github.com/traefik/traefik-helm-chart/blob/ff25058/traefik/values.yaml#L46-L54

but the updated command from @stacklikemind is work

mloiseleur commented 2 years ago

Hello,

Thanks @stacklikemind: I updated the official workaround.

I'm thinking about closing this issue, since on the helm chart side there is nothing more that we can do : we don't have ways to set UMASK on Kubernetes pod, currently.

prasannjeet commented 2 years ago

Adding deployment - initContainers in my values.yaml gives me a strange error: Defaulted container "traefik" out of: traefik, volume-permissions (init)

Has anyone come across how to get around with this issue? Thanks!

AnthonyDeniau commented 1 year ago

Adding deployment - initContainers in my values.yaml gives me a strange error: Defaulted container "traefik" out of: traefik, volume-permissions (init)

Has anyone come across how to get around with this issue? Thanks!

Same here...

prasannjeet commented 1 year ago

Adding deployment - initContainers in my values.yaml gives me a strange error: Defaulted container "traefik" out of: traefik, volume-permissions (init) Has anyone come across how to get around with this issue? Thanks!

Same here...

Don't remember exactly how I fixed this problem, but I did and the problem was with one of the folowing:

  1. Ensure your PVC's are working, and accessible. If it's NFS, make sure containers are able to access those files and folders.
  2. Try to change volume permissions. The accepted permissions on the volume is 600, but if that doesn't work, try changing it to 777 (chmod -R a+rwx /path).

Note: Although 777 is not the solution, but at least you get a different error message and then there's a progress...

djpbessems commented 1 year ago

Adding deployment - initContainers in my values.yaml gives me a strange error: Defaulted container "traefik" out of: traefik, volume-permissions (init)

Has anyone come across how to get around with this issue? Thanks!

That's not an error; by adding an initContainer, you are adding a second container to the deployment/pod. When interacting with the pod you either have to specify which container you want to work with, or you let kubectl pick one for you, and this message is telling you that it just did that for you.

rptaylor commented 1 year ago

If the volume filesystem is ext4 the recursive chmod doesn't work, the init container will fail because of lost+found:

mode of '/data/acme.json' changed to 0600 (rw-------)
chmod: /data/lost+found: Operation not permitted
chmod: /data/lost+found: Operation not permitted

Also to make the init container work unprivileged (depends on cluster security rules) you need to set the securityContext. I eventually got it to work like this:

deployment:
  initContainers:
    - name: volume-permissions
      image: busybox:latest
      command: ["sh", "-c", "chmod -v 600 /data/acme.json"]
      securityContext:
        runAsGroup: 65532
        runAsUser: 65532
      volumeMounts:
        - name: data
          mountPath: /data

However before setting up the initContainer, I confirmed that after I manually did chmod 0600 to fix the problem, and then kubectl -n traefik rollout restart deployment/traefik to restart the pod, somehow acme.json was getting changed back to 660. However I also tested deleting the volume and PVC, uninstalling the Helm chart and starting completely fresh, and acme.json was originally 600 but later changed to 660. There is nothing as far as I can see in the helm chart or entrypoint script that would change the permissions of the acme.json file (unless you set an initContainer) so it must be the traefik binary changing it to 660 under certain conditions.

@jakubhajek even if it were a matter of umask settings, that comes from /etc/profile in the traefik image and should be fixed in the traefik image (why not put it in /entrypoint.sh ?)

But Traefik creates the acme.json file in the first place so it should simply create it with the right permissions. I don't see how it can be related to the underlying storage volume or umask. Maybe there is some golang quirk that is changing the file mode when it gets opened or something?

mloiseleur commented 1 year ago

@rptaylor Does this work for you by unsetting fsGroup on PodSecurityContext ?

--set podSecurityContext.fsGroup=null

I mean, does it remove for your setup the need to add an initContainer ?