Installing Traefik via helm on RKE2 cluster does not give Traefik access to any ressources

ic4-y commented 1 year ago

Welcome!

[X] Yes, I've searched similar issues on GitHub and didn't find any.
[X] Yes, I've searched similar issues on the Traefik community forum and didn't find any.

What version of the Traefik's Helm Chart are you using?

23.1.0

What version of Traefik are you using?

v2.10.1

What did you do?

I installed Traefik via helm in my bare metal RKE2 cluster running version 1.24.14+rke2r1, like so

helm upgrade --install traefik traefik/traefik --namespace traefik --create-namespace --values values.yaml

Since traefik was unable to create any Ingress objects, I started to go on a debug mission. And it turned out that the traefik serviceaccount appears to not have any of the permissions it needs. The question is why that happens.

What did you see instead?

When checking logs

k logs -n traefik -l app.kubernetes.io/name=traefik

The following comes up (and more of the same):

W0626 15:28:38.149274       1 reflector.go:424] k8s.io/client-go@v0.26.3/tools/cache/reflector.go:169: failed to list *v1.IngressClass: ingressclasses.networking.k8s.io is forbidden: User "system:serviceaccount:traefik:traefik" cannot list resource "ingressclasses" in API group "networking.k8s.io" at the cluster scope
E0626 15:28:38.149345       1 reflector.go:140] k8s.io/client-go@v0.26.3/tools/cache/reflector.go:169: Failed to watch *v1.IngressClass: failed to list *v1.IngressClass: ingressclasses.networking.k8s.io is forbidden: User "system:serviceaccount:traefik:traefik" cannot list resource "ingressclasses" in API group "networking.k8s.io" at the cluster scope
W0626 15:28:46.342356       1 reflector.go:424] k8s.io/client-go@v0.26.3/tools/cache/reflector.go:169: failed to list *v1alpha1.IngressRouteTCP: ingressroutetcps.traefik.containo.us is forbidden: User "system:serviceaccount:traefik:traefik" cannot list resource "ingressroutetcps" in API group "traefik.containo.us" at the cluster scope
E0626 15:28:46.342453       1 reflector.go:140] k8s.io/client-go@v0.26.3/tools/cache/reflector.go:169: Failed to watch *v1alpha1.IngressRouteTCP: failed to list *v1alpha1.IngressRouteTCP: ingressroutetcps.traefik.containo.us is forbidden: User "system:serviceaccount:traefik:traefik" cannot list resource "ingressroutetcps" in API group "traefik.containo.us" at the cluster scope
W0626 15:28:47.850082       1 reflector.go:424] k8s.io/client-go@v0.26.3/tools/cache/reflector.go:169: failed to list *v1alpha1.ServersTransport: serverstransports.traefik.io is forbidden: User "system:serviceaccount:traefik:traefik" cannot list resource "serverstransports" in API group "traefik.io" at the cluster scope
E0626 15:28:47.850166       1 reflector.go:140] k8s.io/client-go@v0.26.3/tools/cache/reflector.go:169: Failed to watch *v1alpha1.ServersTransport: failed to list *v1alpha1.ServersTransport: serverstransports.traefik.io is forbidden: User "system:serviceaccount:traefik:traefik" cannot list resource "serverstransports" in API group "traefik.io" at the cluster scope
W0626 15:28:48.158193       1 reflector.go:424] k8s.io/client-go@v0.26.3/tools/cache/reflector.go:169: failed to list *v1alpha1.IngressRouteUDP: ingressrouteudps.traefik.containo.us is forbidden: User "system:serviceaccount:traefik:traefik" cannot list resource "ingressrouteudps" in API group "traefik.containo.us" at the cluster scope
E0626 15:28:48.158256       1 reflector.go:140] k8s.io/client-go@v0.26.3/tools/cache/reflector.go:169: Failed to watch *v1alpha1.IngressRouteUDP: failed to list *v1alpha1.IngressRouteUDP: ingressrouteudps.traefik.containo.us is forbidden: User "system:serviceaccount:traefik:traefik" cannot list resource "ingressrouteudps" in API group "traefik.containo.us" at the cluster scope
W0626 15:28:50.467960       1 reflector.go:424] k8s.io/client-go@v0.26.3/tools/cache/reflector.go:169: failed to list *v1alpha1.TLSStore: tlsstores.traefik.containo.us is forbidden: User "system:serviceaccount:traefik:traefik" cannot list resource "tlsstores" in API group "traefik.containo.us" at the cluster scope
E0626 15:28:50.468030       1 reflector.go:140] k8s.io/client-go@v0.26.3/tools/cache/reflector.go:169: Failed to watch *v1alpha1.TLSStore: failed to list *v1alpha1.TLSStore: tlsstores.traefik.containo.us is forbidden: User "system:serviceaccount:traefik:traefik" cannot list resource "tlsstores" in API group "traefik.containo.us" at the cluster scope

In other words, traefik, or rather the traefik serviceaccount appears to be unable to access any of its needed resources.

However when escalating the traefik serviceaccount to cluster-admin for debug purposes, the issue goes away. Hence the permission are clearly not set as they are supposed to be.

Doing this:

kubectl create clusterrolebinding traefik-cluster-admin --clusterrole=cluster-admin --serviceaccount=traefik:traefik

then gets me this in the logs:

time="2023-06-26T15:29:47Z" level=debug msg="Added outgoing tracing middleware api@internal" entryPointName=traefik routerName=traefik-traefik-dashboard-d012b7f875133eeab4e5@kubernetescrd middlewareName=tracing middlewareType=TracingForwarder
time="2023-06-26T15:29:47Z" level=debug msg="Creating middleware" entryPointName=traefik middlewareName=traefik-internal-recovery middlewareType=Recovery
time="2023-06-26T15:29:47Z" level=debug msg="Creating middleware" entryPointName=metrics middlewareName=metrics-entrypoint middlewareType=Metrics
time="2023-06-26T15:29:47Z" level=debug msg="Creating middleware" entryPointName=traefik middlewareName=metrics-entrypoint middlewareType=Metrics
time="2023-06-26T15:29:47Z" level=debug msg="Creating middleware" entryPointName=web middlewareName=metrics-entrypoint middlewareType=Metrics
time="2023-06-26T15:29:47Z" level=debug msg="Creating middleware" entryPointName=websecure middlewareName=metrics-entrypoint middlewareType=Metrics
time="2023-06-26T15:29:47Z" level=debug msg="Creating middleware" middlewareName=metrics-entrypoint middlewareType=Metrics entryPointName=metrics
time="2023-06-26T15:29:47Z" level=debug msg="Creating middleware" entryPointName=traefik middlewareName=metrics-entrypoint middlewareType=Metrics
time="2023-06-26T15:29:47Z" level=debug msg="Creating middleware" middlewareType=Metrics entryPointName=web middlewareName=metrics-entrypoint
time="2023-06-26T15:29:47Z" level=debug msg="Creating middleware" entryPointName=websecure middlewareName=metrics-entrypoint middlewareType=Metrics

In order to make sure that I had the RBAC permissions correctly set I also manually applied the one in the Traefik documentation which look like this:

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: traefik-ingress-controller

rules:
  - apiGroups:
      - ""
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: traefik-ingress-controller

roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: traefik-ingress-controller
subjects:
  - kind: ServiceAccount
    name: traefik-ingress-controller
    namespace: traefik
      - secrets
    verbs:
      - get
      - list
      - watch
  - apiGroups:
      - extensions
      - networking.k8s.io
    resources:
      - ingresses
      - ingressclasses
    verbs:
      - get
      - list
      - watch
  - apiGroups:
      - extensions
      - networking.k8s.io
    resources:
      - ingresses/status
    verbs:
      - update
  - apiGroups:
      - traefik.io
      - traefik.containo.us
    resources:
      - middlewares
      - middlewaretcps
      - ingressroutes
      - traefikservices
      - ingressroutetcps
      - ingressrouteudps
      - tlsoptions
      - tlsstores
      - serverstransports
    verbs:
      - get
      - list
      - watch

---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: traefik-ingress-controller

roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: traefik-ingress-controller
subjects:
  - kind: ServiceAccount
    name: traefik-ingress-controller
    namespace: traefik

But that did not help either.

What is your environment & configuration?

Bare metal RKE2 cluster version 1.24.14+rke2r1.

My rather lengthy values.yaml file is attached here. It is mostly copy-paste from the helm chart. The fact that traefik is deployed as ClusterIP with and externalIP and not with a LoadBalancer is intentional. It also works nicely for the time being. But maybe I introduced some misconfiguration on the way?

# Default values for Traefik
image:
  registry: docker.io
  repository: traefik
  # defaults to appVersion
  tag: ""
  pullPolicy: IfNotPresent

#
# Configure integration with Traefik Hub
#
hub:
  ## Enabling Hub will:
  # * enable Traefik Hub integration on Traefik
  # * add `traefikhub-tunl` endpoint
  # * enable Prometheus metrics with addRoutersLabels
  # * enable allowExternalNameServices on KubernetesIngress provider
  # * enable allowCrossNamespace on KubernetesCRD provider
  # * add an internal (ClusterIP) Service, dedicated for Traefik Hub
  enabled: false
  ## Default port can be changed
  # tunnelPort: 9901
  ## TLS is optional. Insecure is mutually exclusive with any other options
  # tls:
  #   insecure: false
  #   ca: "/path/to/ca.pem"
  #   cert: "/path/to/cert.pem"
  #   key: "/path/to/key.pem"

#
# Configure the deployment
#
deployment:
  enabled: true
  # Can be either Deployment or DaemonSet
  kind: Deployment
  # Number of pods of the deployment (only applies when kind == Deployment)
  replicas: 1
  # Number of old history to retain to allow rollback (If not set, default Kubernetes value is set to 10)
  # revisionHistoryLimit: 1
  # Amount of time (in seconds) before Kubernetes will send the SIGKILL signal if Traefik does not shut down
  terminationGracePeriodSeconds: 60
  # The minimum number of seconds Traefik needs to be up and running before the DaemonSet/Deployment controller considers it available
  minReadySeconds: 0
  # Additional deployment annotations (e.g. for jaeger-operator sidecar injection)
  annotations: {}
  # Additional deployment labels (e.g. for filtering deployment by custom labels)
  labels: {}
  # Additional pod annotations (e.g. for mesh injection or prometheus scraping)
  podAnnotations: {}
  # Additional Pod labels (e.g. for filtering Pod by custom labels)
  podLabels: {}
  # Additional containers (e.g. for metric offloading sidecars)
  additionalContainers:
    []
    # https://docs.datadoghq.com/developers/dogstatsd/unix_socket/?tab=host
    # - name: socat-proxy
    # image: alpine/socat:1.0.5
    # args: ["-s", "-u", "udp-recv:8125", "unix-sendto:/socket/socket"]
    # volumeMounts:
    #   - name: dsdsocket
    #     mountPath: /socket
  # Additional volumes available for use with initContainers and additionalContainers
  additionalVolumes:
    []
    # - name: dsdsocket
    #   hostPath:
    #     path: /var/run/statsd-exporter
  # Additional initContainers (e.g. for setting file permission as shown below)
  initContainers:
    []
    # The "volume-permissions" init container is required if you run into permission issues.
    # Related issue: https://github.com/traefik/traefik-helm-chart/issues/396
    # - name: volume-permissions
    #   image: busybox:latest
    #   command: ["sh", "-c", "touch /data/acme.json; chmod -v 600 /data/acme.json"]
    #   securityContext:
    #     runAsNonRoot: true
    #     runAsGroup: 65532
    #     runAsUser: 65532
    #   volumeMounts:
    #     - name: data
    #       mountPath: /data
  # Use process namespace sharing
  shareProcessNamespace: false
  # Custom pod DNS policy. Apply if `hostNetwork: true`
  # dnsPolicy: ClusterFirstWithHostNet
  dnsConfig:
    {}
    # nameservers:
    #   - 192.0.2.1 # this is an example
    # searches:
    #   - ns1.svc.cluster-domain.example
    #   - my.dns.search.suffix
    # options:
    #   - name: ndots
    #     value: "2"
    #   - name: edns0
  # Additional imagePullSecrets
  imagePullSecrets:
    []
    # - name: myRegistryKeySecretName
  # Pod lifecycle actions
  lifecycle:
    {}
    # preStop:
    #   exec:
    #     command: ["/bin/sh", "-c", "sleep 40"]
    # postStart:
    #   httpGet:
    #     path: /ping
    #     port: 9000
    #     host: localhost
    #     scheme: HTTP

# Pod disruption budget
podDisruptionBudget:
  enabled: false
  # maxUnavailable: 1
  # maxUnavailable: 33%
  # minAvailable: 0
  # minAvailable: 25%

# Create a default IngressClass for Traefik
ingressClass:
  enabled: true
  isDefaultClass: true

# Enable experimental features
experimental:
  v3:
    enabled: false
  plugins:
    enabled: false
  kubernetesGateway:
    enabled: false
    gateway:
      enabled: true
    # certificate:
    #   group: "core"
    #   kind: "Secret"
    #   name: "mysecret"
    # By default, Gateway would be created to the Namespace you are deploying Traefik to.
    # You may create that Gateway in another namespace, setting its name below:
    # namespace: default
    # Additional gateway annotations (e.g. for cert-manager.io/issuer)
    # annotations:
    #   cert-manager.io/issuer: letsencrypt

# Create an IngressRoute for the dashboard
ingressRoute:
  dashboard:
    enabled: true
    # Additional ingressRoute annotations (e.g. for kubernetes.io/ingress.class)
    annotations: {}
    # Additional ingressRoute labels (e.g. for filtering IngressRoute by custom labels)
    labels: {}
    # The router match rule used for the dashboard ingressRoute
    matchRule: PathPrefix(`/dashboard`) || PathPrefix(`/api`)
    # Specify the allowed entrypoints to use for the dashboard ingress route, (e.g. traefik, web, websecure).
    # By default, it's using traefik entrypoint, which is not exposed.
    # /!\ Do not expose your dashboard without any protection over the internet /!\
    entryPoints: ["traefik"]
    # Additional ingressRoute middlewares (e.g. for authentication)
    middlewares: []
    # TLS options (e.g. secret containing certificate)
    tls: {}

# Customize updateStrategy of traefik pods
updateStrategy:
  type: RollingUpdate
  rollingUpdate:
    maxUnavailable: 0
    maxSurge: 1

# Customize liveness and readiness probe values.
readinessProbe:
  failureThreshold: 1
  initialDelaySeconds: 2
  periodSeconds: 10
  successThreshold: 1
  timeoutSeconds: 2

livenessProbe:
  failureThreshold: 3
  initialDelaySeconds: 2
  periodSeconds: 10
  successThreshold: 1
  timeoutSeconds: 2

#
# Configure providers
#
providers:
  kubernetesCRD:
    enabled: true
    allowCrossNamespace: false
    allowExternalNameServices: false
    allowEmptyServices: false
    # ingressClass: traefik-internal
    # labelSelector: environment=production,method=traefik
    namespaces:
      # ["traefik", "kube-prometheus-stack"]
      # - "default"

  kubernetesIngress:
    enabled: true
    allowExternalNameServices: false
    allowEmptyServices: false
    # ingressClass: traefik-internal
    # labelSelector: environment=production,method=traefik
    namespaces:
      # ["traefik", "kube-prometheus-stack"]
      # - "default"
    # IP used for Kubernetes Ingress endpoints
    publishedService:
      enabled: true
      # Published Kubernetes Service to copy status from. Format: namespace/servicename
      # By default this Traefik service
      # pathOverride: ""

#
# Add volumes to the traefik pod. The volume name will be passed to tpl.
# This can be used to mount a cert pair or a configmap that holds a config.toml file.
# After the volume has been mounted, add the configs into traefik by using the `additionalArguments` list below, eg:
# additionalArguments:
# - "--providers.file.filename=/config/dynamic.toml"
# - "--ping"
# - "--ping.entrypoint=web"
volumes: []
# - name: public-cert
#   mountPath: "/certs"
#   type: secret
# - name: '{{ printf "%s-configs" .Release.Name }}'
#   mountPath: "/config"
#   type: configMap

# Additional volumeMounts to add to the Traefik container
additionalVolumeMounts:
  []
  # For instance when using a logshipper for access logs
  # - name: traefik-logs
  #   mountPath: /var/log/traefik

## Logs
## https://docs.traefik.io/observability/logs/
logs:
  ## Traefik logs concern everything that happens to Traefik itself (startup, configuration, events, shutdown, and so on).
  general:
    # By default, the logs use a text format (common), but you can
    # also ask for the json format in the format option
    # format: json
    # By default, the level is set to ERROR.
    # Alternative logging levels are DEBUG, PANIC, FATAL, ERROR, WARN, and INFO.
    level: DEBUG
  access:
    # To enable access logs
    enabled: false
    ## By default, logs are written using the Common Log Format (CLF) on stdout.
    ## To write logs in JSON, use json in the format option.
    ## If the given format is unsupported, the default (CLF) is used instead.
    # format: json
    # filePath: "/var/log/traefik/access.log
    ## To write the logs in an asynchronous fashion, specify a bufferingSize option.
    ## This option represents the number of log lines Traefik will keep in memory before writing
    ## them to the selected output. In some cases, this option can greatly help performances.
    # bufferingSize: 100
    ## Filtering https://docs.traefik.io/observability/access-logs/#filtering
    filters:
      {}
      # statuscodes: "200,300-302"
      # retryattempts: true
      # minduration: 10ms
    ## Fields
    ## https://docs.traefik.io/observability/access-logs/#limiting-the-fieldsincluding-headers
    fields:
      general:
        defaultmode: keep
        names:
          {}
          ## Examples:
          # ClientUsername: drop
      headers:
        defaultmode: drop
        names:
          {}
          ## Examples:
          # User-Agent: redact
          # Authorization: drop
          # Content-Type: keep

metrics:
  ## Prometheus is enabled by default.
  ## It can be disabled by setting "prometheus: null"
  prometheus:
    ## Entry point used to expose metrics.
    entryPoint: metrics
    ## Enable metrics on entry points. Default=true
    # addEntryPointsLabels: false
    ## Enable metrics on routers. Default=false
    # addRoutersLabels: true
    ## Enable metrics on services. Default=true
    # addServicesLabels: false
    ## Buckets for latency metrics. Default="0.1,0.3,1.2,5.0"
    # buckets: "0.5,1.0,2.5"
    ## When manualRouting is true, it disables the default internal router in
    ## order to allow creating a custom router for prometheus@internal service.
    # manualRouting: true
#  datadog:
#    ## Address instructs exporter to send metrics to datadog-agent at this address.
#    address: "127.0.0.1:8125"
#    ## The interval used by the exporter to push metrics to datadog-agent. Default=10s
#    # pushInterval: 30s
#    ## The prefix to use for metrics collection. Default="traefik"
#    # prefix: traefik
#    ## Enable metrics on entry points. Default=true
#    # addEntryPointsLabels: false
#    ## Enable metrics on routers. Default=false
#    # addRoutersLabels: true
#    ## Enable metrics on services. Default=true
#    # addServicesLabels: false
#  influxdb:
#    ## Address instructs exporter to send metrics to influxdb at this address.
#    address: localhost:8089
#    ## InfluxDB's address protocol (udp or http). Default="udp"
#    protocol: udp
#    ## InfluxDB database used when protocol is http. Default=""
#    # database: ""
#    ## InfluxDB retention policy used when protocol is http. Default=""
#    # retentionPolicy: ""
#    ## InfluxDB username (only with http). Default=""
#    # username: ""
#    ## InfluxDB password (only with http). Default=""
#    # password: ""
#    ## The interval used by the exporter to push metrics to influxdb. Default=10s
#    # pushInterval: 30s
#    ## Additional labels (influxdb tags) on all metrics.
#    # additionalLabels:
#    #   env: production
#    #   foo: bar
#    ## Enable metrics on entry points. Default=true
#    # addEntryPointsLabels: false
#    ## Enable metrics on routers. Default=false
#    # addRoutersLabels: true
#    ## Enable metrics on services. Default=true
#    # addServicesLabels: false
#  influxdb2:
#    ## Address instructs exporter to send metrics to influxdb v2 at this address.
#    address: localhost:8086
#    ## Token with which to connect to InfluxDB v2.
#    token: xxx
#    ## Organisation where metrics will be stored.
#    org: ""
#    ## Bucket where metrics will be stored.
#    bucket: ""
#    ## The interval used by the exporter to push metrics to influxdb. Default=10s
#    # pushInterval: 30s
#    ## Additional labels (influxdb tags) on all metrics.
#    # additionalLabels:
#    #   env: production
#    #   foo: bar
#    ## Enable metrics on entry points. Default=true
#    # addEntryPointsLabels: false
#    ## Enable metrics on routers. Default=false
#    # addRoutersLabels: true
#    ## Enable metrics on services. Default=true
#    # addServicesLabels: false
#  statsd:
#    ## Address instructs exporter to send metrics to statsd at this address.
#    address: localhost:8125
#    ## The interval used by the exporter to push metrics to influxdb. Default=10s
#    # pushInterval: 30s
#    ## The prefix to use for metrics collection. Default="traefik"
#    # prefix: traefik
#    ## Enable metrics on entry points. Default=true
#    # addEntryPointsLabels: false
#    ## Enable metrics on routers. Default=false
#    # addRoutersLabels: true
#    ## Enable metrics on services. Default=true
#    # addServicesLabels: false
#  openTelemetry:
#    ## Address of the OpenTelemetry Collector to send metrics to.
#    address: "localhost:4318"
#    ## Enable metrics on entry points.
#    addEntryPointsLabels: true
#    ## Enable metrics on routers.
#    addRoutersLabels: true
#    ## Enable metrics on services.
#    addServicesLabels: true
#    ## Explicit boundaries for Histogram data points.
#    explicitBoundaries:
#      - "0.1"
#      - "0.3"
#      - "1.2"
#      - "5.0"
#    ## Additional headers sent with metrics by the reporter to the OpenTelemetry Collector.
#    headers:
#      foo: bar
#      test: test
#    ## Allows reporter to send metrics to the OpenTelemetry Collector without using a secured protocol.
#    insecure: true
#    ## Interval at which metrics are sent to the OpenTelemetry Collector.
#    pushInterval: 10s
#    ## Allows to override the default URL path used for sending metrics. This option has no effect when using gRPC transport.
#    path: /foo/v1/traces
#    ## Defines the TLS configuration used by the reporter to send metrics to the OpenTelemetry Collector.
#    tls:
#      ## The path to the certificate authority, it defaults to the system bundle.
#      ca: path/to/ca.crt
#      ## The path to the public certificate. When using this option, setting the key option is required.
#      cert: path/to/foo.cert
#      ## The path to the private key. When using this option, setting the cert option is required.
#      key: path/to/key.key
#      ## If set to true, the TLS connection accepts any certificate presented by the server regardless of the hostnames it covers.
#      insecureSkipVerify: true
#    ## This instructs the reporter to send metrics to the OpenTelemetry Collector using gRPC.
#    grpc: true

##
##  enable optional CRDs for Prometheus Operator
##
## Create a dedicated metrics service for use with ServiceMonitor
## When hub.enabled is set to true, it's not needed: it will use hub service.
#  service:
#    enabled: false
#    labels: {}
#    annotations: {}
## When set to true, it won't check if Prometheus Operator CRDs are deployed
#  disableAPICheck: false
#  serviceMonitor:
#    metricRelabelings: []
#      - sourceLabels: [__name__]
#        separator: ;
#        regex: ^fluentd_output_status_buffer_(oldest|newest)_.+
#        replacement: $1
#        action: drop
#    relabelings: []
#      - sourceLabels: [__meta_kubernetes_pod_node_name]
#        separator: ;
#        regex: ^(.*)$
#        targetLabel: nodename
#        replacement: $1
#        action: replace
#    jobLabel: traefik
#    interval: 30s
#    honorLabels: true
#    # (Optional)
#    # scrapeTimeout: 5s
#    # honorTimestamps: true
#    # enableHttp2: true
#    # followRedirects: true
#    # additionalLabels:
#    #   foo: bar
#    # namespace: "another-namespace"
#    # namespaceSelector: {}
#  prometheusRule:
#    additionalLabels: {}
#    namespace: "another-namespace"
#    rules:
#      - alert: TraefikDown
#        expr: up{job="traefik"} == 0
#        for: 5m
#        labels:
#          context: traefik
#          severity: warning
#        annotations:
#          summary: "Traefik Down"
#          description: "{{ $labels.pod }} on {{ $labels.nodename }} is down"

tracing:
  {}
  # instana:
  #   localAgentHost: 127.0.0.1
  #   localAgentPort: 42699
  #   logLevel: info
  #   enableAutoProfile: true
  # datadog:
  #   localAgentHostPort: 127.0.0.1:8126
  #   debug: false
  #   globalTag: ""
  #   prioritySampling: false
  # jaeger:
  #   samplingServerURL: http://localhost:5778/sampling
  #   samplingType: const
  #   samplingParam: 1.0
  #   localAgentHostPort: 127.0.0.1:6831
  #   gen128Bit: false
  #   propagation: jaeger
  #   traceContextHeaderName: uber-trace-id
  #   disableAttemptReconnecting: true
  #   collector:
  #      endpoint: ""
  #      user: ""
  #      password: ""
  # zipkin:
  #   httpEndpoint: http://localhost:9411/api/v2/spans
  #   sameSpan: false
  #   id128Bit: true
  #   sampleRate: 1.0
  # haystack:
  #   localAgentHost: 127.0.0.1
  #   localAgentPort: 35000
  #   globalTag: ""
  #   traceIDHeaderName: ""
  #   parentIDHeaderName: ""
  #   spanIDHeaderName: ""
  #   baggagePrefixHeaderName: ""
  # elastic:
  #   serverURL: http://localhost:8200
  #   secretToken: ""
  #   serviceEnvironment: ""

globalArguments:
  - "--global.checknewversion"
  - "--global.sendanonymoususage"

#
# Configure Traefik static configuration
# Additional arguments to be passed at Traefik's binary
# All available options available on https://docs.traefik.io/reference/static-configuration/cli/
## Use curly braces to pass values: `helm install --set="additionalArguments={--providers.kubernetesingress.ingressclass=traefik-internal,--log.level=DEBUG}"`
additionalArguments: []
#  - "--providers.kubernetesingress.ingressclass=traefik-internal"
#  - "--log.level=DEBUG"

# Environment variables to be passed to Traefik's binary
env: []
# - name: SOME_VAR
#   value: some-var-value
# - name: SOME_VAR_FROM_CONFIG_MAP
#   valueFrom:
#     configMapRef:
#       name: configmap-name
#       key: config-key
# - name: SOME_SECRET
#   valueFrom:
#     secretKeyRef:
#       name: secret-name
#       key: secret-key

envFrom: []
# - configMapRef:
#     name: config-map-name
# - secretRef:
#     name: secret-name

# Configure ports
ports:
  # The name of this one can't be changed as it is used for the readiness and
  # liveness probes, but you can adjust its config to your liking
  traefik:
    port: 9000
    # Use hostPort if set.
    # hostPort: 9000
    #
    # Use hostIP if set. If not set, Kubernetes will default to 0.0.0.0, which
    # means it's listening on all your interfaces and all your IPs. You may want
    # to set this value if you need traefik to listen on specific interface
    # only.
    # hostIP: 192.168.100.10

    # Override the liveness/readiness port. This is useful to integrate traefik
    # with an external Load Balancer that performs healthchecks.
    # Default: ports.traefik.port
    # healthchecksPort: 9000

    # Override the liveness/readiness scheme. Useful for getting ping to
    # respond on websecure entryPoint.
    # healthchecksScheme: HTTPS

    # Defines whether the port is exposed if service.type is LoadBalancer or
    # NodePort.
    #
    # You SHOULD NOT expose the traefik port on production deployments.
    # If you want to access it from outside of your cluster,
    # use `kubectl port-forward` or create a secure ingress
    expose: false
    # The exposed port for this service
    exposedPort: 9000
    # The port protocol (TCP/UDP)
    protocol: TCP
  web:
    ## Enable this entrypoint as a default entrypoint. When a service doesn't explicity set an entrypoint it will only use this entrypoint.
    # asDefault: true
    port: 8000
    # hostPort: 8000
    # containerPort: 8000
    expose: true
    exposedPort: 80
    ## Different target traefik port on the cluster, useful for IP type LB
    # targetPort: 80
    # The port protocol (TCP/UDP)
    protocol: TCP
    # Use nodeport if set. This is useful if you have configured Traefik in a
    # LoadBalancer.
    # nodePort: 32080
    # Port Redirections
    # Added in 2.2, you can make permanent redirects via entrypoints.
    # https://docs.traefik.io/routing/entrypoints/#redirection
    # redirectTo: websecure
    #
    # Trust forwarded  headers information (X-Forwarded-*).
    # forwardedHeaders:
    #   trustedIPs: []
    #   insecure: false
    #
    # Enable the Proxy Protocol header parsing for the entry point
    # proxyProtocol:
    #   trustedIPs: []
    #   insecure: false
  websecure:
    ## Enable this entrypoint as a default entrypoint. When a service doesn't explicity set an entrypoint it will only use this entrypoint.
    # asDefault: true
    port: 8443
    # hostPort: 8443
    # containerPort: 8443
    expose: true
    exposedPort: 443
    ## Different target traefik port on the cluster, useful for IP type LB
    # targetPort: 80
    ## The port protocol (TCP/UDP)
    protocol: TCP
    # nodePort: 32443
    #
    ## Enable HTTP/3 on the entrypoint
    ## Enabling it will also enable http3 experimental feature
    ## https://doc.traefik.io/traefik/routing/entrypoints/#http3
    ## There are known limitations when trying to listen on same ports for
    ## TCP & UDP (Http3). There is a workaround in this chart using dual Service.
    ## https://github.com/kubernetes/kubernetes/issues/47249#issuecomment-587960741
    http3:
      enabled: false
    # advertisedPort: 4443
    #
    ## Trust forwarded  headers information (X-Forwarded-*).
    #forwardedHeaders:
    #  trustedIPs: []
    #  insecure: false
    #
    ## Enable the Proxy Protocol header parsing for the entry point
    #proxyProtocol:
    #  trustedIPs: []
    #  insecure: false
    #
    ## Set TLS at the entrypoint
    ## https://doc.traefik.io/traefik/routing/entrypoints/#tls
    tls:
      enabled: true
      # this is the name of a TLSOption definition
      options: ""
      certResolver: ""
      domains: []
      # - main: example.com
      #   sans:
      #     - foo.example.com
      #     - bar.example.com
    #
    # One can apply Middlewares on an entrypoint
    # https://doc.traefik.io/traefik/middlewares/overview/
    # https://doc.traefik.io/traefik/routing/entrypoints/#middlewares
    # /!\ It introduces here a link between your static configuration and your dynamic configuration /!\
    # It follows the provider naming convention: https://doc.traefik.io/traefik/providers/overview/#provider-namespace
    # middlewares:
    #   - namespace-name1@kubernetescrd
    #   - namespace-name2@kubernetescrd
    middlewares: []
  metrics:
    # When using hostNetwork, use another port to avoid conflict with node exporter:
    # https://github.com/prometheus/prometheus/wiki/Default-port-allocations
    port: 9100
    # hostPort: 9100
    # Defines whether the port is exposed if service.type is LoadBalancer or
    # NodePort.
    #
    # You may not want to expose the metrics port on production deployments.
    # If you want to access it from outside of your cluster,
    # use `kubectl port-forward` or create a secure ingress
    expose: false
    # The exposed port for this service
    exposedPort: 9100
    # The port protocol (TCP/UDP)
    protocol: TCP

# TLS Options are created as TLSOption CRDs
# https://doc.traefik.io/traefik/https/tls/#tls-options
# When using `labelSelector`, you'll need to set labels on tlsOption accordingly.
# Example:
# tlsOptions:
#   default:
#     labels: {}
#     sniStrict: true
#     preferServerCipherSuites: true
#   customOptions:
#     labels: {}
#     curvePreferences:
#       - CurveP521
#       - CurveP384
tlsOptions: {}

# TLS Store are created as TLSStore CRDs. This is useful if you want to set a default certificate
# https://doc.traefik.io/traefik/https/tls/#default-certificate
# Example:
# tlsStore:
#   default:
#     defaultCertificate:
#       secretName: tls-cert
tlsStore: {}

service:
  enabled: true
  single: true
  type: ClusterIP
  annotations: {}
  annotationsTCP: {}
  annotationsUDP: {}
  # Additional service labels (e.g. for filtering Service by custom labels)
  labels: {}
  # Additional entries here will be added to the service spec.
  # Cannot contain type, selector or ports entries.
  spec:
    {}
    # externalTrafficPolicy: Cluster
    # loadBalancerIP: "1.2.3.4"
    # clusterIP: "2.3.4.5"
  loadBalancerSourceRanges:
    []
  externalIPs:
    - xxx.xxx.xxx.xxx # This would normally contain the actual IP of one of my nodes, it is left out here

## Create HorizontalPodAutoscaler object.
##
autoscaling:
  enabled: false
#   minReplicas: 1
#   maxReplicas: 10
#   metrics:
#   - type: Resource
#     resource:
#       name: cpu
#       target:
#         type: Utilization
#         averageUtilization: 60
#   - type: Resource
#     resource:
#       name: memory
#       target:
#         type: Utilization
#         averageUtilization: 60
#   behavior:
#     scaleDown:
#       stabilizationWindowSeconds: 300
#       policies:
#       - type: Pods
#         value: 1
#         periodSeconds: 60

# Enable persistence using Persistent Volume Claims
# ref: http://kubernetes.io/docs/user-guide/persistent-volumes/
# It can be used to store TLS certificates, see `storage` in certResolvers
persistence:
  enabled: false
  name: data
  #  existingClaim: ""
  accessMode: ReadWriteOnce
  size: 128Mi
  # storageClass: ""
  # volumeName: ""
  path: /data
  annotations: {}
  # subPath: "" # only mount a subpath of the Volume into the pod

certResolvers: {}
#   letsencrypt:
#     # for challenge options cf. https://doc.traefik.io/traefik/https/acme/
#     email: email@example.com
#     dnsChallenge:
#       # also add the provider's required configuration under env
#       # or expand then from secrets/configmaps with envfrom
#       # cf. https://doc.traefik.io/traefik/https/acme/#providers
#       provider: digitalocean
#       # add futher options for the dns challenge as needed
#       # cf. https://doc.traefik.io/traefik/https/acme/#dnschallenge
#       delayBeforeCheck: 30
#       resolvers:
#         - 1.1.1.1
#         - 8.8.8.8
#     tlsChallenge: true
#     httpChallenge:
#       entryPoint: "web"
#     # It has to match the path with a persistent volume
#     storage: /data/acme.json

# If hostNetwork is true, runs traefik in the host network namespace
# To prevent unschedulabel pods due to port collisions, if hostNetwork=true
# and replicas>1, a pod anti-affinity is recommended and will be set if the
# affinity is left as default.
hostNetwork: false

# Whether Role Based Access Control objects like roles and rolebindings should be created
rbac:
  enabled: false
  # If set to false, installs ClusterRole and ClusterRoleBinding so Traefik can be used across namespaces.
  # If set to true, installs Role and RoleBinding. Providers will only watch target namespace.
  namespaced: false
  # Enable user-facing roles
  # https://kubernetes.io/docs/reference/access-authn-authz/rbac/#user-facing-roles
  # aggregateTo: [ "admin" ]

# Enable to create a PodSecurityPolicy and assign it to the Service Account via RoleBinding or ClusterRoleBinding
podSecurityPolicy:
  enabled: false

# The service account the pods will use to interact with the Kubernetes API
serviceAccount:
  # If set, an existing service account is used
  # If not set, a service account is created automatically using the fullname template
  name: ""

# Additional serviceAccount annotations (e.g. for oidc authentication)
serviceAccountAnnotations: {}

resources:
  {}
  # requests:
  #   cpu: "100m"
  #   memory: "50Mi"
  # limits:
  #   cpu: "300m"
  #   memory: "150Mi"

# This example pod anti-affinity forces the scheduler to put traefik pods
# on nodes where no other traefik pods are scheduled.
# It should be used when hostNetwork: true to prevent port conflicts
affinity: {}
#  podAntiAffinity:
#    requiredDuringSchedulingIgnoredDuringExecution:
#      - labelSelector:
#          matchLabels:
#            app.kubernetes.io/name: '{{ template "traefik.name" . }}'
#            app.kubernetes.io/instance: '{{ .Release.Name }}-{{ .Release.Namespace }}'
#        topologyKey: kubernetes.io/hostname

nodeSelector: {}
tolerations: []
topologySpreadConstraints: []
# # This example topologySpreadConstraints forces the scheduler to put traefik pods
# # on nodes where no other traefik pods are scheduled.
#  - labelSelector:
#      matchLabels:
#        app: '{{ template "traefik.name" . }}'
#    maxSkew: 1
#    topologyKey: kubernetes.io/hostname
#    whenUnsatisfiable: DoNotSchedule

# Pods can have priority.
# Priority indicates the importance of a Pod relative to other Pods.
priorityClassName: ""

# Set the container security context
# To run the container with ports below 1024 this will need to be adjust to run as root
securityContext:
  capabilities:
    drop: [ALL]
  readOnlyRootFilesystem: true

podSecurityContext:
  #  # /!\ When setting fsGroup, Kubernetes will recursively changes ownership and
  #  # permissions for the contents of each volume to match the fsGroup. This can
  #  # be an issue when storing sensitive content like TLS Certificates /!\
  #  fsGroup: 65532
  fsGroupChangePolicy: "OnRootMismatch"
  runAsGroup: 65532
  runAsNonRoot: true
  runAsUser: 65532

#
# Extra objects to deploy (value evaluated as a template)
#
# In some cases, it can avoid the need for additional, extended or adhoc deployments.
# See #595 for more details and traefik/tests/values/extra.yaml for example.
extraObjects: []
# This will override the default Release Namespace for Helm.
# It will not affect optional CRDs such as `ServiceMonitor` and `PrometheusRules`
# namespaceOverride: traefik
#
## This will override the default app.kubernetes.io/instance label for all Objects.
# instanceLabelOverride: traefik

Additional Information

I am not quite sure if this is a bug related to the helm chart, to traefik, to the combination of traefik with RKE2 in my setup, or something else entirely. I decided to post it here since I used helm as installation method for traefik.

In case you find this issue is not at the right place here, please let me know as well where I should rather put it.

Thanks anyone!

arana198 commented 1 year ago

I have the exact same issue

E0626 21:05:19.476439       1 reflector.go:140] k8s.io/client-go@v0.26.3/tools/cache/reflector.go:169: Failed to watch *v1alpha1.ServersTransportTCP: failed to list *v1alpha1.ServersTransportTCP: serverstransporttcps.traefik.io is forbidden: User "system:serviceaccount:traefik:traefik" cannot list resource "serverstransporttcps" in API group "traefik.io" at the cluster scope

Trafeik was working fine until yesterday. I upgraded helm chart (no values.yaml changes) and issue of SA cannot list resource start occuring

ic4-y commented 1 year ago

That is interesting, meaning: me misconfiguring it might not be the reason why it isn't working :laughing:

On the flip side, I might solve my issue then by downgrading the chart I suppose. I will try that later. Not ideal, but it should prove the point.

mloiseleur commented 1 year ago

@arana198 @icodeforyou-dot-net This error message:

E0626 21:05:19.476439       1 reflector.go:140] k8s.io/client-go@v0.26.3/tools/cache/reflector.go:169: Failed to watch *v1alpha1.ServersTransportTCP: failed to list *v1alpha1.ServersTransportTCP: serverstransporttcps.traefik.io is forbidden: User "system:serviceaccount:traefik:traefik" cannot list resource "serverstransporttcps" in API group "traefik.io" at the cluster scope

It may means that you haven't upgraded CRDs. See release notes of v23.x or upgrade documentation for more details.

Can you confirm that this solve your issue ?

ic4-y commented 1 year ago

@mloiseleur

I just tried doing

kubectl apply --server-side --force-conflicts -k https://github.com/traefik/traefik-helm-chart/traefik/crds/

that did not help. I also re-deployed traefik after running the command above just to make sure.

Does not change anything in my case. I still get stuff like:

W0627 07:44:06.759950       1 reflector.go:424] k8s.io/client-go@v0.26.3/tools/cache/reflector.go:169: failed to list *v1alpha1.ServersTransport: serverstransports.traefik.io is forbidden: User "system:serviceaccount:traefik:traefik" cannot list resource "serverstransports" in API group "traefik.io" at the cluster scope
E0627 07:44:06.760023       1 reflector.go:140] k8s.io/client-go@v0.26.3/tools/cache/reflector.go:169: Failed to watch *v1alpha1.ServersTransport: failed to list *v1alpha1.ServersTransport: serverstransports.traefik.io is forbidden: User "system:serviceaccount:traefik:traefik" cannot list resource "serverstransports" in API group "traefik.io" at the cluster scope
W0627 07:44:07.096477       1 reflector.go:424] k8s.io/client-go@v0.26.3/tools/cache/reflector.go:169: failed to list *v1.Service: services is forbidden: User "system:serviceaccount:traefik:traefik" cannot list resource "services" in API group "" at the cluster scope
E0627 07:44:07.096549       1 reflector.go:140] k8s.io/client-go@v0.26.3/tools/cache/reflector.go:169: Failed to watch *v1.Service: failed to list *v1.Service: services is forbidden: User "system:serviceaccount:traefik:traefik" cannot list resource "services" in API group "" at the cluster scope

Edit: As mentioned above, my issue appears to be somehow related to clusterroles and clusterrolebindings (or roles and rolebindings respectively when I deploy it accordingly). Since when I promote the Traefik serviceaccount to cluster-admin the issue goes away.

arana198 commented 1 year ago

@arana198 @icodeforyou-dot-net This error message:
E0626 21:05:19.476439       1 reflector.go:140] k8s.io/client-go@v0.26.3/tools/cache/reflector.go:169: Failed to watch *v1alpha1.ServersTransportTCP: failed to list *v1alpha1.ServersTransportTCP: serverstransporttcps.traefik.io is forbidden: User "system:serviceaccount:traefik:traefik" cannot list resource "serverstransporttcps" in API group "traefik.io" at the cluster scope
Means that you haven't upgraded CRDs. See release notes of v23.x or upgrade documentation for more details.

Can you confirm that this solve your issue ?

I saw the documentation last night and updated the CRDs. Unfortunately this hasn't resolved the issue for me

helm search repo traefik/traefik NAME CHART VERSION APP VERSION DESCRIPTION traefik/traefik 9.1.1 2.2.8 A Traefik based Kubernetes ingress controller

Steps taken:

update crds manually
uninstall and install traefik with latest version of helm chart

**Update:

I removed old helm repo (https://helm.traefik.io/traefik) and adding new one to https://traefik.github.io/charts

➜ ~ helm search repo traefik/traefik NAME CHART VERSION APP VERSION DESCRIPTION traefik/traefik 23.1.0 v2.10.1 A Traefik based Kubernetes ingress controller traefik/traefik-mesh 4.1.1 v1.4.8 Traefik Mesh - Simpler Service Mesh traefik/traefikee 1.14.1 v2.10.2 Traefik Enterprise is a unified cloud-native ne...**

Issue still perists

mloiseleur commented 1 year ago

In your values.yaml, it's specified to not create RBAC:

[...]
# Whether Role Based Access Control objects like roles and rolebindings should be created
rbac:
  enabled: false

If you set it to true, it should provide the RBAC needed.

arana198 commented 1 year ago

Thanks that solved my issue.

Just so that it is useful for everyone:

Update CRDs
Update helm repo to https://traefik.github.io/charts
Enable rbac as mentioned by @mloiseleur

ic4-y commented 1 year ago

@mloiseleur

Excellent, it appears to be working on my end as well!

Thanks for the help!

However one remark. The comments in the defaults values file are misleadning in this case. They state that

# If set to false, installs ClusterRole and ClusterRoleBinding so Traefik can be used across namespaces.

Which appears to not or no longer work.

mloiseleur commented 1 year ago

@icodeforyou-dot-net this comment is for namespaced.

  # If set to false, installs ClusterRole and ClusterRoleBinding so Traefik can be used across namespaces.
  # If set to true, installs Role and RoleBinding. Providers will only watch target namespace.
  namespaced: false

Feel free to open a PR if you have a good idea on how to help users with those settings.

traefik / traefik-helm-chart