istio / istio

Connect, secure, control, and observe services.
https://istio.io
Apache License 2.0
36.16k stars 7.79k forks source link

Istio waypoint hpa PDB custom resources #53964

Open tomahkvt opened 6 days ago

tomahkvt commented 6 days ago

Subject: Proposal: Enhancements to Waypoint Proxy for Production-Ready Deployments

Dear Istio Team,

Thank you for your continued work on Istio.

I would like to propose several enhancements to the Waypoint Proxy configuration to make it more production-ready. These changes focus on improved scalability, resilience, and configurability, addressing current limitations that impact production-grade deployments.

Proposed Changes:

  1. Custom Replica Count for Waypoints:

Add the ability to configure the number of replicas for waypoint proxies. Suggested change to line #42 of waypoint.yaml:

yaml

  {{- if or (isset .ObjectMeta.Annotations `sidecar.istio.io/replicaCount`) (.Values.global.waypoint.replicaCount) }}
  replicas: {{ annotation .ObjectMeta `sidecar.istio.io/replicaCount` .Values.global.waypoint.replicaCount }}
  {{- end }}
  1. Enhanced Resource Management:

Replace lines #201-204 of waypoint.yaml with the following to allow customizable CPU and memory settings via annotations:

yaml

resources:
  requests:
    cpu: "{{ annotation .ObjectMeta `sidecar.istio.io/proxyCPU` .Values.global.waypoint.resources.requests.cpu }}"
    memory: "{{ annotation .ObjectMeta `sidecar.istio.io/proxyMemory` .Values.global.waypoint.resources.requests.memory }}"
  limits:
    memory: "{{ annotation .ObjectMeta `sidecar.istio.io/proxyMemoryLimit` .Values.global.waypoint.resources.limits.memory }}"
  1. Extend Values.yaml for Waypoint Customization: Update Values.yaml to include:
waypoint:
  # Resources for the waypoint proxy.
  replicaCount: null
  resources:
    requests:
      cpu: 100m
      memory: 128Mi
    limits:
      memory: 1Gi
  1. Per-Waypoint Resource Overrides: Allow resource settings to be configured individually for each waypoint proxy using Gateway annotations:
apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
  name: waypoint
  annotations:
    networking.istio.io/service-type: ClusterIP
    sidecar.istio.io/replicaCount: "2"
    sidecar.istio.io/proxyCPU: "50m"
  labels:
    istio.io/waypoint-for: all
spec:
  gatewayClassName: istio-waypoint
  listeners:
    - name: mesh
      port: 15008
      protocol: HBONE
  1. Add HorizontalPodAutoscaler and PodDisruptionBudget: Include templates for HPA and PDB in https://github.com/istio/istio/blob/release-1.24/manifests/charts/istio-control/istio-discovery/files/waypoint.yaml waypoint.yaml to ensure production-ready scaling and availability.

  2. Production Use Case By integrating HPA and PDB directly into the waypoint.yaml, we can achieve dynamic scaling and maintain availability, making Istio more suitable for production-grade environments.

Benefits:

  1. Scalability: HPA ensures that waypoint proxies scale to meet traffic demands.
  2. Resilience: PDB prevents excessive disruptions during updates or maintenance.
  3. Flexibility: Fine-grained resource configuration for each waypoint proxy via annotations. Thank you for considering this proposal. I am happy to assist with further details or implementation examples if needed.

Affected product area (please put an X in all that apply)

[x ] Ambient [ ] Docs [ ] Dual Stack [ ] Installation [ ] Networking [ ] Performance and Scalability [ ] Extensions and Telemetry [ ] Security [ ] Test and Release [ ] User Experience [ ] Developer Infrastructure

Affected features (please put an X in all that apply)

[ ] Multi Cluster [ ] Virtual Machine [ ] Multi Control Plane

Additional context

howardjohn commented 6 days ago

Thanks for the issue.

One comment on 'While configuring HPA for a waypoint proxy using the Gateway API documentation, I noticed that istiod reverts the deployment replica count to the previous state.Please fix this problem.'

I have tested this and don't see this happening. Istio itself should not be touching the replicas field (which ends up being defaulted to 1 by k8s).

Are you sure you are seeing this on 1.24 with an HPA attached? If so, how can we reproduce?

tomahkvt commented 5 days ago

Hi @howardjohn. Thank you a lot for your quick answer. I found my mistake and updated my question.