maksim-paskal / aks-node-termination-handler

Gracefully handle Azure Virtual Machines shutdown within Kubernetes
Apache License 2.0
34 stars 6 forks source link

Enhancement for EventType freeze #43

Closed muthusamymm closed 1 year ago

muthusamymm commented 1 year ago

Hi, Currently when azure sends eventType FREEZE , aks node terminator drains all pods and stops watching for new events. The issue what we see is , azure does not take that worker node down , so no new worker node creates by VMscaleset. The worker remains in unscheduled state and it is charged .

I possible for FREEZE state alone , after drain watch for events again and when the new event comes related to unfreeze/normal , uncordon that worker node and keep watching for new events . eventTypeFreeze

maksim-paskal commented 1 year ago

@muthusamymm thanks for opening this issue. This make sense, Freeze is actualy temporary state for Azure resource.

Freeze: The Virtual Machine is scheduled to pause for a few seconds. CPU and network connectivity may be suspended, but there's no impact on memory or open files.

Reboot: The Virtual Machine is scheduled for reboot (non-persistent memory is lost). This event is made available on a best effort basis
Redeploy: The Virtual Machine is scheduled to move to another node (ephemeral disks are lost). This event is delivered on a best effort basis.
Preempt: The Spot Virtual Machine is being deleted (ephemeral disks are lost).
Terminate: The virtual machine is scheduled to be deleted.

I try to impelement some changes to exclude this event from draining

muthusamymm commented 1 year ago

Thank you for the support.

On Fri, 23 Jun, 2023, 12:34 pm Maksim Paskal, @.***> wrote:

@muthusamymm https://github.com/muthusamymm thanks for opening this issue. This make sense, Freeze is actualy temporary state for Azure resource.

Freeze: The Virtual Machine is scheduled to pause for a few seconds. CPU and network connectivity may be suspended, but there's no impact on memory or open files.

Reboot: The Virtual Machine is scheduled for reboot (non-persistent memory is lost). This event is made available on a best effort basis Redeploy: The Virtual Machine is scheduled to move to another node (ephemeral disks are lost). This event is delivered on a best effort basis. Preempt: The Spot Virtual Machine is being deleted (ephemeral disks are lost). Terminate: The virtual machine is scheduled to be deleted.

I try to impelement some changes to exclude this event from draining

— Reply to this email directly, view it on GitHub https://github.com/maksim-paskal/aks-node-termination-handler/issues/43#issuecomment-1603774139, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJT43NRLHVAVBILXKBZHCBDXMU5X7ANCNFSM6AAAAAAZRCE7AA . You are receiving this because you were mentioned.Message ID: @.*** com>

maksim-paskal commented 1 year ago

@muthusamymm released new version v1.0.4 - in new version: 1) now Freeze event will not trigger node draining 2) log this events in node "events" 3) add Prometheus metrics, you could view this events now in Prometheus

you need to install new chart to get this changes

muthusamymm commented 1 year ago

Thank you for the update, will rollout in our env.

On Mon, 26 Jun, 2023, 1:19 pm Maksim Paskal, @.***> wrote:

@muthusamymm https://github.com/muthusamymm released new version v1.0.4

  • in new version:

    1. now Freeze event will not trigger node draining
    2. log this events in node "events"
    3. add Prometheus metrics, you could view this events now in Prometheus

you need to install https://github.com/maksim-paskal/aks-node-termination-handler#installation new chart to get this changes

— Reply to this email directly, view it on GitHub https://github.com/maksim-paskal/aks-node-termination-handler/issues/43#issuecomment-1606903040, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJT43NR4I4LHWVQHEQ2O5VDXNE5JTANCNFSM6AAAAAAZRCE7AA . You are receiving this because you were mentioned.Message ID: @.*** com>