kubernetes / ingress-nginx

Ingress NGINX Controller for Kubernetes
https://kubernetes.github.io/ingress-nginx/
Apache License 2.0
17.35k stars 8.23k forks source link

Enabling Helm .Values.controller.opentelemetry.enabled prevents pods from being rescheduled on different nodes and cluster auto-scaling from working. #10911

Open calbot opened 8 months ago

calbot commented 8 months ago

What happened: Enabling .Values.controller.opentelemetry.enabled prevents pods from being rescheduled on different nodes and cluster auto-scaling from working.

What you expected to happen: Expected cluster auto-scaling to continue to reschedule ingres-nginx pods on different nodes

NGINX Ingress controller version (exec into the pod and run nginx-ingress-controller --version.): registry.k8s.io/ingress-nginx/controller:v1.9.5@sha256:b3aba22b1da80e7acfc52b115cae1d4c687172cbf2b742d5b502419c25ff340e

The helm chart adds emptyDir: {} here when the .Values.controller.opentelemetry.enabled is set to true: https://github.com/kubernetes/ingress-nginx/blob/0c3d52bade1219c1d61d486e44a91ded48766aac/charts/ingress-nginx/templates/controller-daemonset.yaml#L215

I think it should also add the annotation in podAnnotations area in the same file... cluster-autoscaler.kubernetes.io/safe-to-evict: "true" automatically in the helm chart when .Values.controller.opentelemetry.enabled or at least give a warning that you might want to consider adding it yourself because cluster autoscaling won't be able to move your ingress-nginx pods otherwise.

calbot commented 8 months ago

When I add

podAnnotations:
    cluster-autoscaler.kubernetes.io/safe-to-evict: "true"

scaling works again as expected but I think it should not be necessary to do this.

toredash commented 8 months ago

Please provide additional configuration, e.g. the complete values.yaml file.

I tested locally to see what difference there is in manifest output when settings these values:

controller: 
  opentelemetry:
    enabled: "false"
  config:
    enable-opentelemetry: "true"
  autoscaling:
    apiVersion: autoscaling/v2
    enabled: true
    minReplicas: 2

And the difference is as expected:

% diff enable-opentelemetry-*                           
256c256
<   enable-opentelemetry: "false"
---
>   enable-opentelemetry: "true"
598a599,600
>         - mountPath: /modules_mount
>           name: modules
602a605,622
>       initContainers:
>       - command:
>         - /init_module
>         image: registry.k8s.io/ingress-nginx/opentelemetry:v20230721-3e2062ee5@sha256:13bee3f5223883d3ca62fee7309ad02d22ec00ff0d7033e3e9aca7a9f60fd472
>         name: opentelemetry
>         securityContext:
>           allowPrivilegeEscalation: false
>           capabilities:
>             drop:
>             - ALL
>           readOnlyRootFilesystem: true
>           runAsNonRoot: true
>           runAsUser: 65532
>           seccompProfile:
>             type: RuntimeDefault
>         volumeMounts:
>         - mountPath: /modules_mount
>           name: modules
614a635,636
>       - emptyDir: {}
>         name: modules

I don't think this is an Nginx issue, but an issue with your configuration. Please provide more information, and look into what specifically cluster-autoscaler is giving as an error.

calbot commented 8 months ago

Yes, that's the expected output. Notice the emptyDir which is the source of the problem due to cluster autoscaler's handling of emptyDir...

This is an issue with the helm chart as I see it. If .Values.controller.opentelemetry.enabled is true cluster-autoscaler.kubernetes.io/safe-to-evict: "true" annotation should be added to yaml generated by helm like this (but currently isn't)...

spec:
  selector:
    matchLabels:
      app.kubernetes.io/name: ingress-nginx
      app.kubernetes.io/instance: ingress-nginx
      app.kubernetes.io/component: controller
  revisionHistoryLimit: 10
  minReadySeconds: 0
  template:
    metadata:
      annotations:
        cluster-autoscaler.kubernetes.io/safe-to-evict: "true"

See the custer-autoscaler (Not HorizontalPodAutoscaler) FAQ regarding how emptyDir is handled... https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/FAQ.md

This lets the cluster autoscaler know it's safe to evict/reschedule the ingress-nginx pods even though they have the emptyDir so that nodes can be shutdown.

When I add the following to my values.yaml for ingress-nginx it works properly because the annotation is added...

controller:

  # Allow moving nginx pods even though we have the emptyDir for the open-telemetry
  # https://github.com/kubernetes/autoscaler/issues/2048
  podAnnotations:
    cluster-autoscaler.kubernetes.io/safe-to-evict: "true"

This is where the fix needs to be added to automatically add the safe-to-evict annotation...

https://github.com/kubernetes/ingress-nginx/blob/0c3d52bade1219c1d61d486e44a91ded48766aac/charts/ingress-nginx/templates/controller-daemonset.yaml#L29

toredash commented 8 months ago

Seems you figured out a way to solve this with the configuration options already available in the helm chart. Or is there additional changes you believe must be implemented? If so which?

calbot commented 8 months ago

I think that annotation should automatically be added by the helm template since turning on opentelemetry shouldn't have the side effect of breaking cluster autoscaling. But yes, it's fixed for me using the podAnnotations (even before I opened this issue).

calbot commented 8 months ago

I think the helm template controller-daemonset.yaml part where podAnnotations are used should be updated to something like this (untested!!)...

  template:
    metadata:
    {{- if or .Values.controller.extraModules .Values.controller.opentelemetry.enabled .Values.controller.podAnnotations }}
      annotations:
    {{- end }}
    {{- if or .Values.controller.extraModules .Values.controller.opentelemetry.enabled }}
        cluster-autoscaler.kubernetes.io/safe-to-evict: "true"
    {{- end }}
    {{- if .Values.controller.podAnnotations }}
      {{- range $key, $value := .Values.controller.podAnnotations }}
        {{ $key }}: {{ $value | quote }}
      {{- end }}
    {{- end }}
toredash commented 8 months ago

I'm not a maintainer, but this seems like a documentation enhancement rather than a helm change. I know they encourage PRs, so I would suggest that

longwuyuan commented 8 months ago

/remove-kind bug /kind documentation /area docs

Yeah, please submit PR if you are inclined to. Thank you very much

/help

k8s-ci-robot commented 8 months ago

@longwuyuan: This request has been marked as needing help from a contributor.

Guidelines

Please ensure that the issue body includes answers to the following questions:

For more details on the requirements of such an issue, please see here and ensure that they are met.

If this request no longer meets these requirements, the label can be removed by commenting with the /remove-help command.

In response to [this](https://github.com/kubernetes/ingress-nginx/issues/10911): >/remove-kind bug >/kind documentation >/area docs > >Yeah, please submit PR if you are inclined to. Thank you very much > >/help Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
longwuyuan commented 8 months ago

/triage accepted

longwuyuan commented 8 months ago

/assign @esigo @esigo Ehsan, does the template need altering with a if condition ?

github-actions[bot] commented 7 months ago

This is stale, but we won't close it automatically, just bare in mind the maintainers may be busy with other tasks and will reach your issue ASAP. If you have any question or request to prioritize this, please reach #ingress-nginx-dev on Kubernetes Slack.