kubernetes / ingress-nginx

Ingress NGINX Controller for Kubernetes
https://kubernetes.github.io/ingress-nginx/
Apache License 2.0
17.56k stars 8.27k forks source link

Pod can't be started with sysctls custom settings #11962

Open yaroslav-nakonechnikov opened 2 months ago

yaroslav-nakonechnikov commented 2 months ago

Hello,

What happened:

i'm getting next warning, which prevents to start nginx pod:

  Warning  FailedCreatePodSandBox  37m (x3 over 40m)      kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create containerd task: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: write /proc/sys/net/ipv4/ip_local_port_range: invalid argument: unknown
  Warning  FailedCreatePodSandBox  2m14s (x163 over 42m)  kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create containerd task: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: write /proc/sys/net/core/somaxconn: invalid argument: unknown

deploy was done via terraforms helm_release resource:

...
      "controller.podSecurityContext.sysctls[1].name"  = "net.core.somaxconn"
      "controller.podSecurityContext.sysctls[1].value" = "\"32768\""
      "controller.podSecurityContext.sysctls[0].name"  = "net.ipv4.ip_local_port_range"
      "controller.podSecurityContext.sysctls[0].value" = "\"1024 65000\""
      "sysctls.net\\.core\\.somaxconn"                 = "32768"
      "sysctls.net\\.ipv4\\.ip_local_port_range"       = "1024 65000"
...

values are rendered like:

USER-SUPPLIED VALUES:
controller:
  admissionWebhooks:
    patch:
      image:
        image: prj-eks-42718-ingress-kube-webhook-certgen
        registry: SOMEID.dkr.ecr.eu-central-1.amazonaws.com
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: karpenter.sh/nodepool
            operator: In
            values:
            - prj-eks-42718-ingress
    podAntiAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
      - podAffinityTerm:
          labelSelector:
            matchExpressions:
            - key: app.kubernetes.io/component
              operator: In
              values:
              - controller
          topologyKey: topology.kubernetes.io/zone
        weight: 100
      - podAffinityTerm:
          labelSelector:
            matchExpressions:
            - key: app.kubernetes.io/component
              operator: In
              values:
              - controller
          topologyKey: topology.kubernetes.io/hostname
        weight: 99
  config:
    allow-snippet-annotations: true
    enable-opentelemetry: true
    log-format-escape-json: true
    log-format-escape-none: true
    log-format-stream: '{"module":"log_stream","src_ip":"$remote_addr","timestamp":"$time_local","protocol":"$protocol","status":"$status","bytes_out":$bytes_sent,"bytes_in":$bytes_received,"session_time":"$session_time","upstream_addr":"$upstream_addr","upstream_bytes_out":"$upstream_bytes_sent","upstream_bytes_in":"$upstream_bytes_received","upstream_connect_time":"$upstream_connect_time","proxy_upstream_name":"$proxy_upstream_name"}'
    log-format-upstream: '{"module":"upstreamlog","src_ip":"$remote_addr", "username":"$remote_user","timestamp":"$time_local",
      "request":"$request", "status":"$status", "bytes_sent":"$body_bytes_sent", "http_referer":"$http_referer",
      "http_user_agent":"$http_user_agent", "req_len":$request_length, "req_time":"$request_time","proxy_upstream_name":"$proxy_upstream_name",
      "proxy_alternative_upstream_name":"$proxy_alternative_upstream_name", "upstream_addr":"$upstream_addr",
      "upstream_response_length":"$upstream_response_length","upstream_response_time":"$upstream_response_time",
      "upstream_status":"$upstream_status", "req_id":"$req_id", "service_name":"$service_name"}'
    retry-non-idempotent: true
  extraVolumeMounts:
  - mountPath: /mnt/indexer
    name: indexer
    readOnly: true
  - mountPath: /mnt/ingress
    name: ingress
    readOnly: true
  extraVolumes:
  - name: indexer
    secret:
      secretName: indexer
  - name: ingress
    secret:
      secretName: ingress
  image:
    image: prj-eks-42718-ingress-controller
    registry: SOMEID.dkr.ecr.eu-central-1.amazonaws.com
  ingressClassResource:
    default: true
  kind: Deployment
  opentelemetry:
    enabled: true
    image:
      image: prj-eks-42718-ingress-opentelemetry-1.25.3
      registry: SOMEID.dkr.ecr.eu-central-1.amazonaws.com
  podSecurityContext:
    sysctls:
    - name: net.ipv4.ip_local_port_range
      value: '"1024 65000"'
    - name: net.core.somaxconn
      value: '"32768"'
  port: '{"https":443}'
  resources:
    requests:
      cpu: 128m
      memory: 512Mi
  service:
    enableHttp: false
    type: ClusterIP
  tolerations:
  - effect: NoSchedule
    key: function
    operator: Equal
    value: ingress
  topologySpreadConstraints:
  - labelSelector:
      matchExpressions:
      - key: app.kubernetes.io/component
        operator: In
        values:
        - controller
    maxSkew: 1
    topologyKey: topology.kubernetes.io/zone
    whenUnsatisfiable: ScheduleAnyway
defaultBackend:
  image:
    image: prj-eks-42718-ingress-defaultbackend-amd64
    registry: SOMEID.dkr.ecr.eu-central-1.amazonaws.com
sysctls:
  net.core.somaxconn: 32768
  net.ipv4.ip_local_port_range: 1024 65000
tcp:
  "8089": ingress/ingress-nginx-controller:443
  "9999": ingress/ingress-nginx-controller:443

as i see, there is somehow additional chars passed there:

 podSecurityContext:
    sysctls:
    - name: net.ipv4.ip_local_port_range
      value: '"1024 65000"'
    - name: net.core.somaxconn
      value: '"32768"'

but if i write nex:

...
      "controller.podSecurityContext.sysctls[1].name"  = "net.core.somaxconn"
      "controller.podSecurityContext.sysctls[1].value" = "32768"
      "controller.podSecurityContext.sysctls[0].name"  = "net.ipv4.ip_local_port_range"
      "controller.podSecurityContext.sysctls[0].value" = "1024 65000"
      "sysctls.net\\.core\\.somaxconn"                 = "32768"
      "sysctls.net\\.ipv4\\.ip_local_port_range"       = "1024 65000"
...

it fails on apply stage like:

Error: failed to replace object: Deployment in version "v1" cannot be handled as a Deployment: json: cannot unmarshal number into Go struct field Sysctl.spec.template.spec.securityContext.sysctls.value of type string

  with helm_release.ingress_nginx,
  on ingress-nginx.tf line 71, in resource "helm_release" "ingress_nginx":
  71: resource "helm_release" "ingress_nginx" {

Why? how it is possible to provide values, so it will work?

What you expected to happen:

Simple notation works without issues.

NGINX Ingress controller version (exec into the pod and run nginx-ingress-controller --version.): installed with chart 4.10.4

k8s-ci-robot commented 2 months ago

This issue is currently awaiting triage.

If Ingress contributors determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes-sigs/prow](https://github.com/kubernetes-sigs/prow/issues/new?title=Prow%20issue:) repository.
longwuyuan commented 2 months ago

Does it work if you do not customize PodSecurityContext ?

yaroslav-nakonechnikov commented 2 months ago

yes, it works perfectly.

and if i edit deployment like kubectl edit deployment -n ingress ingress-nginx-controller:

$ kubectl get deployment -n ingress ingress-nginx-controller -o yaml | grep securityContext -A 5
        securityContext:
          allowPrivilegeEscalation: false
          capabilities:
            add:
            - NET_BIND_SERVICE
            drop:
--
        securityContext:
          allowPrivilegeEscalation: false
          capabilities:
            drop:
            - ALL
          readOnlyRootFilesystem: true
--
      securityContext:
        sysctls:
        - name: net.ipv4.ip_local_port_range
          value: 1024 65000
        - name: net.core.somaxconn
          value: "32768"

it stats fine:

ingress-nginx-controller-db89d67bd-mpfkn:/etc/nginx$ sysctl net. | grep max
net.core.somaxconn = 32768
longwuyuan commented 2 months ago

At least for 2 sysctl arguments, the error message is unknown value ;

range: invalid argument: unknown

So this is not a bug but a misconfiguration of sysctl arguments.

That kind of config is not in controller code as it just passes that from template to the Kubernetes API

/remove-kind bug

I think you should manually try those sysctl commands and see what fits

longwuyuan commented 2 months ago

/kind support

yaroslav-nakonechnikov commented 2 months ago

but manually it works. I know that from hcl sometimes it is hard to pass some values, and for passing custom log_format - it looks extremely weird. but for sysctls i tried several notations - doesn't work.

workaround with additional modification after helm_release - it works without problem.

ps. almost same problem is with keda addon. But i will report it later, as it is not so critical.

longwuyuan commented 2 months ago

There is a word about unsupported. Have you checked

% k explain pod.spec.securityContext.sysctls
KIND:       Pod
VERSION:    v1

FIELD: sysctls <[]Sysctl>

DESCRIPTION:
    Sysctls hold a list of namespaced sysctls used for the pod. Pods with
    unsupported sysctls (by the container runtime) might fail to launch. Note
    that this field cannot be set when spec.os.name is windows.
    Sysctl defines a kernel parameter to be set

FIELDS:
  name  <string> -required-
    Name of a property to set

  value <string> -required-
    Value of a property to set
yaroslav-nakonechnikov commented 2 months ago

@longwuyuan if i manually (or even with terraform) updating deployment after initial helm install - it starts to work as expected. About unsupported sysctl parameters i've read, but it is different.

longwuyuan commented 2 months ago

Then its a parsing problem. Have you played with the string.

yaroslav-nakonechnikov commented 2 months ago

yes, i've tried next versions: "controller.podSecurityContext.sysctls[1].value" = 32768 "controller.podSecurityContext.sysctls[1].value" = "32768" "controller.podSecurityContext.sysctls[1].value" = "\"32768\"" "controller.podSecurityContext.sysctls[1].value" = "'32768'" "controller.podSecurityContext.sysctls[1].value" = '32768'

nothing works.

longwuyuan commented 2 months ago

reduce upper port number to 60000 and try

longwuyuan commented 2 months ago

try

      "sysctls.net\\.core\\.somaxconn"                 = "30000"
      "sysctls.net\\.ipv4\\.ip_local_port_range"       = "1024 60000"
longwuyuan commented 2 months ago

or maybe ;

      "controller.podSecurityContext.sysctls[1].name"  = "net.core.somaxconn"
      "controller.podSecurityContext.sysctls[1].value" = 32768
      "controller.podSecurityContext.sysctls[0].name"  = "net.ipv4.ip_local_port_range"
      "controller.podSecurityContext.sysctls[0].value" = "1024 65000"

I am not sure how to solve but I am sure this is not controller code as these keys & values are passed straight from the rendered template to the kubeapi-server .... you can enable debug and check the json payload

yaroslav-nakonechnikov commented 2 months ago

i also tried outside of dynamic set:

  set {
    name  = "controller.podSecurityContext.sysctls[0].value"
    value = "32768"
    type  = "auto"
  }

and

  set {
    name  = "controller.podSecurityContext.sysctls[0].value"
    value = 32768
    type  = "auto"
  }

gives: Error: failed to replace object: Deployment in version "v1" cannot be handled as a Deployment: json: cannot unmarshal number into Go struct field Sysctl.spec.template.spec.securityContext.sysctls.value of type string

   set {
    name  = "controller.podSecurityContext.sysctls"
    value = "[\\{\"name\":\"net.core.somaxconn\"\\,\"value\":\"32768\"\\}\\,\\{\"name\":\"net.ipv4.ip_local_port_range\"\\,\"value\":\"1024 65000\"\\}]"
    type  = "auto"
  }

and

 set {
    name  = "controller.podSecurityContext.sysctls[0]"
    value = "\\{\"name\":\"net.core.somaxconn\"\\,\"value\":\"32768\"\\}"
    type  = "auto"
  }

gives Error: failed to replace object: Deployment in version "v1" cannot be handled as a Deployment: json: cannot unmarshal string into Go struct field PodSecurityContext.spec.template.spec.securityContext.sysctls of type []v1.Sysctl

longwuyuan commented 2 months ago

Please come talk on Kubernetes Slack as there are not many resources here.

The error message is proof that this is about parsing and var interpolation. I think that this works without terraform or ArgoCD type of tools so its not a problem with the controller. Some expert of these tools has to comment how to inject int instead of string etc etc.

github-actions[bot] commented 1 month ago

This is stale, but we won't close it automatically, just bare in mind the maintainers may be busy with other tasks and will reach your issue ASAP. If you have any question or request to prioritize this, please reach #ingress-nginx-dev on Kubernetes Slack.