opendatahub-io / modelmesh-serving

Controller for ModelMesh
Apache License 2.0
3 stars 32 forks source link

[RHOAIENG-11850] Updated etcd manifest #303

Closed mholder6 closed 2 months ago

mholder6 commented 2 months ago

Updated the manifest to include requests and limits for the etcd deployment.

Tested by applying resource quota to the redhat-ods-applications project and viewing which resources did not automatically rollout after applying the rq. After finding that the etcd deployment was the only pod that was not automatically redeployed, viewed the metrics for the etcd pod with no changes to find the request boundaries, and then added and deleted multiple ISVC's to find the limit boundaries.

Instructions to apply ResourceQuota:

Here is a copy of the RQ I used:

apiVersion: v1
kind: ResourceQuota
metadata:
  name: compute-resources
spec:
  hard:
    pods: "4" 
    requests.cpu: 50m 
    requests.memory: 96Mi 
    limits.cpu: "1" 
    limits.memory: 512Mi 

2 ways to apply -- CLI or UI

  1. Using the CLI, ensure you are logged into the cluster, and using the intended project you want to apply the RQ to (in this case we are applying to the redhat-ods-applications project/namespace. a. in the directory where you have saved the above RQ yaml, run oc apply -f <nameOfResourceQuota.yaml>

  2. Using the UI, there are 2 ways -- Creating a Pod or a ResourceQuota directly

Creating a Pod to create a ResourceQuota: a. In the sidebar, navigate to Workloads, and then Pods. b. Ensure you are in the project you want the RQ applied to. There is a drop-down list at the top left of the screen. In this case we are using the redhat-ods-applications project c. Click the blue "Create Pod" button at the top right of the page, and paste the RQ yaml defined above. d. Click the blue "Create" button at the bottom. -- View the ResourceQuota by navigating to the sidebar again, clicking Administration > and then Workloads.

Creating a ResourceQuota Directly: a. In the sidebar, navigate to Administration > and then ResourceQuotas. b. Ensure you are in the project you want the RQ applied to. There is a drop-down list at the top left of the screen. In this case we are using the redhat-ods-applications project c. Click the blue "Create ResourceQuota" button at the top right of the page, and paste the RQ yaml defined above -- or you can manually edit the values you want for the RQ. d. Click the blue "Create" button at the bottom.

Once the RQ is applied, modify the request and limit values to satisfy the resources in your project. The deployments that are not automatically redeployed are the deployments that do not have resource values defined.

Motivation

Modifications

Result

PR checklist

Checklist items below are applicable for development targeted to both fast and stable branches/tags

Checklist items below are applicable for development targeted to both fast and stable branches/tags

israel-hdez commented 2 months ago

So, I have tried this. I had to tune the ResourceQuota to let pods to be created successfully. Despite applying the fix, I still saw the problem reported in the ticket:

Error creating: pods "etcd-68d5dbd5f7-ssffm" is forbidden: failed quota: compute-resources: must specify limits.cpu for: etcd-secret-creator; limits.memory for: etcd-secret-creator; requests.cpu for: etcd-secret-creator; requests.memory for: etcd-secret-creator

Looks like the existing resources aren't the problem, but it is the initContainer (which is the one named etcd-secret-creator, as noted in the error) not having any resources set. Observe here: https://github.com/opendatahub-io/modelmesh-serving/blob/main/config/overlays/odh/quickstart.yaml#L50-L78 the missing resources field.

The PR as is is lowering the memory limits of the etcd container. IMO, if current requests/limits are working (given the right allocation of quotas) we should keep them untouched, and only fix the missing resources spec of the etcd-secret-creator initContainer.

openshift-ci[bot] commented 2 months ago

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: israel-hdez, mholder6

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files: - ~~[OWNERS](https://github.com/opendatahub-io/modelmesh-serving/blob/main/OWNERS)~~ [israel-hdez] Approvers can indicate their approval by writing `/approve` in a comment Approvers can cancel approval by writing `/approve cancel` in a comment