Closed sunya-ch closed 9 months ago
Please feel free to review on the design first. I'm working on fixing the code bugs (stringvars flag, missing rbac, ...). Will amend the commit once confirm all deployment choices work at least on my cluster.
Now fix critical issue on deployment.
However, please allow me to have some issue left (need help from other to fix with other PR):
@sunya-ch Thanks a lot for adding the feature 🤗
Please allow us some time to go through the feature implementation. My first focus will be on the spec.modelServer
to ensure we have only the minimal set of api exposed.
@sunya-ch , Thanks a lot for adding this feature 🙇 .
You can ignore most of the comments in the review, lets focus on getting the spec.modelserver
and spec.estimator
parts to the minimal required configuration. We should be able to make assumptions about the model server that is deployed, and thus may not need all the configurations currently in place.
We also need e2e tests to validate most common configuration and scenarios ...
The status
update of the kepler
should also consider the status of these deployments.
Any thoughts on having both model-server and estimator disabled by default? cc: @sunya-ch @rootfs @piparul ?
@sthaha Thank you so much for the review. I made most changes according to your review. I put comment below the review that is modified slightly from your suggestion.
Any thoughts on having both model-server and estimator disabled by default?
Both should be disabled by default. Except, ModelServerSpec is defined. If any value in this section is defined, we should expect local model server by default (enable model server). Again, we open for remote model server. User can put it disable and provide target URL and port for the remote.
Made an update to the review that marked the icon.
Here are example deployments.
spec:
exporter:
deployment:
port: 9103
oc get -n openshift-kepler-operator all
NAME READY STATUS RESTARTS AGE
pod/kepler-exporter-ds-d4ctn 1/1 Running 0 11s
pod/kepler-exporter-ds-fd5xt 1/1 Running 0 11s
pod/kepler-exporter-ds-fzjk7 1/1 Running 0 11s
pod/kepler-exporter-ds-n46xf 1/1 Running 0 11s
pod/kepler-exporter-ds-nthsc 1/1 Running 0 11s
pod/kepler-exporter-ds-qm7p4 1/1 Running 0 11s
pod/kepler-exporter-ds-s5t48 1/1 Running 0 11s
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/kepler-exporter-svc ClusterIP None <none> 9103/TCP 11s
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
daemonset.apps/kepler-exporter-ds 7 7 7 7 7 kubernetes.io/os=linux 11s
spec:
exporter:
deployment:
port: 9103
estimator:
node:
components:
sidecar: true
initUrl: https://raw.githubusercontent.com/sustainable-computing-io/kepler-model-db/main/models/Linux-4.15.0-213-generic-x86_64_v0.6/rapl/AbsPower/KubeletOnly/GradientBoostingRegressorTrainer_1.zip
NAME READY STATUS RESTARTS AGE
pod/kepler-exporter-ds-5g5kk 2/2 Running 0 16s
pod/kepler-exporter-ds-7tg9j 2/2 Running 0 16s
pod/kepler-exporter-ds-fh4f2 2/2 Running 0 16s
pod/kepler-exporter-ds-fqdnf 2/2 Running 0 16s
pod/kepler-exporter-ds-lgfwx 2/2 Running 0 16s
pod/kepler-exporter-ds-nthhd 2/2 Running 0 16s
pod/kepler-exporter-ds-pgrl6 2/2 Running 0 16s
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/kepler-exporter-svc ClusterIP None <none> 9103/TCP 16s
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
daemonset.apps/kepler-exporter-ds 7 7 7 7 7 kubernetes.io/os=linux 17s
spec:
exporter:
deployment:
port: 9103
estimator:
node:
components:
sidecar: true
initUrl: https://raw.githubusercontent.com/sustainable-computing-io/kepler-model-db/main/models/Linux-4.15.0-213-generic-x86_64_v0.6/rapl/AbsPower/KubeletOnly/GradientBoostingRegressorTrainer_1.zip
modelServer:
enabled: true
oc get all -n openshift-kepler-operator
NAME READY STATUS RESTARTS AGE
pod/kepler-exporter-ds-4bsnt 2/2 Running 0 4m48s
pod/kepler-exporter-ds-679tv 2/2 Running 0 4m48s
pod/kepler-exporter-ds-6cmkf 2/2 Running 0 4m48s
pod/kepler-exporter-ds-9ltv4 2/2 Running 0 4m49s
pod/kepler-exporter-ds-c6wnl 2/2 Running 0 4m48s
pod/kepler-exporter-ds-f2l9z 2/2 Running 0 4m49s
pod/kepler-exporter-ds-z5wkg 2/2 Running 0 2m55s
pod/model-server-deploy-85fd7b8c6d-9dwz9 1/1 Running 0 4m49s
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/kepler-exporter-svc ClusterIP None <none> 9103/TCP 4m50s
service/model-server-svc ClusterIP None <none> 8100/TCP 4m50s
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
daemonset.apps/kepler-exporter-ds 7 7 7 7 7 kubernetes.io/os=linux 4m50s
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/model-server-deploy 1/1 1 1 4m50s
NAME DESIRED CURRENT READY AGE
replicaset.apps/model-server-deploy-85fd7b8c6d 1 1 1 4m50s
This PR updates model server support aiming for release v0.6 as mentioned in https://github.com/sustainable-computing-io/kepler-operator/issues/232.
API doc: https://github.com/sustainable-computing-io/kepler-operator/blob/c15c77621958cc79d1921d9af378915158abc4ca/docs/api.md
The PR contains changes on :
Note that, The holder for setting filters and model name is here on kepler: https://github.com/sustainable-computing-io/kepler/blob/73cb11fb963f425013cf7f03f214c8f8b85c7853/pkg/config/config.go#L390. However, it is not determined how to use it. So, it is not supported yet from end to end.
Example configmap change from full deployment on OpenShift on IBM Cloud (kepler CR:
config/samples/kepler_full_deploy.yaml
)Resources:
exporter log
estimator log
Signed-off-by: Sunyanan Choochotkaew sunyanan.choochotkaew1@ibm.com