project-codeflare / instascale

On-demand Kubernetes/OpenShift cluster scaling and aggregated resource provisioning
Apache License 2.0
10 stars 20 forks source link

add: appwrapper pending status and condition check for scale-up #188

Closed VanillaSpoon closed 11 months ago

VanillaSpoon commented 11 months ago

Issue link

Closes: https://issues.redhat.com/browse/RHOAIENG-861

What changes have been made

This pr contains an update to ensure the AW is in a pending state, with a condition of Insufficient resources before scaling up the specified resources for the appwrapper. This ensure resource usage efficiency for the cluster.

Verification steps

Steps I have followed to ensure the functionality. On a cluster with 2 m6i.2xlarge worker nodes.

I provisioned an appwrapper which required more resources than available on the cluster.

    - custompodresources:
      - limits:
          cpu: 10
          memory: 40G
          nvidia.com/gpu: 0
        replicas: 2
        requests:
          cpu: 10
          memory: 40G
          nvidia.com/gpu: 0
      - limits:
          cpu: 10
          memory: 40G
          nvidia.com/gpu: 1
        replicas: 
        requests:
          cpu: 10
          memory: 40G 
          nvidia.com/gpu: 1

This ensured the aw would reach pending state with the condition on Insufficient resources, inducing a scale up of the specified resources, as can be seen here:

ScaleUp

I also provisioned an appwrapper with required resources within the clusters available

    - custompodresources:
      - limits:
          cpu: 2
          memory: 8G
          nvidia.com/gpu: 0
        replicas: 1
        requests:
          cpu: 2
          memory: 8G
          nvidia.com/gpu: 0
      - limits:
          cpu: 2
          memory: 8G
          nvidia.com/gpu: 1
        replicas: 2
        requests:
          cpu: 2
          memory: 8G
          nvidia.com/gpu: 1

With this the appwrapper was dispatched without the need for scaling up. As can be seen in the following: NoScaleUp

Checks

VanillaSpoon commented 11 months ago

Hey @asm582 I just pushed some changes there regarding the conditions message :)

asm582 commented 11 months ago

Thanks @VanillaSpoon for the quick turnaround, Can you please add some screenshots of manual tests performed on this PR please?

VanillaSpoon commented 11 months ago

Hi @asm582 I have added screenshots and an explanation of the scale-up testing to the pr description :)

asm582 commented 11 months ago

Thanks @VanillaSpoon

LGTM

openshift-ci[bot] commented 11 months ago

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: asm582

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files: - ~~[OWNERS](https://github.com/project-codeflare/instascale/blob/main/OWNERS)~~ [asm582] Approvers can indicate their approval by writing `/approve` in a comment Approvers can cancel approval by writing `/approve cancel` in a comment