sustainable-computing-io / kepler-model-server

Model Server for Kepler
Apache License 2.0
25 stars 25 forks source link

Pkl files of 0.7 spec-power is incompatible with python 3.10 #339

Closed sthaha closed 2 months ago

sthaha commented 3 months ago

What happened?

Run kepler vm compose pointing to latest release model server. See diff below

diff --git a/manifests/compose/validation/vm/compose.yaml b/manifests/compose/validation/vm/compose.yaml
index bc5bf5d2..906d4a16 100644
--- a/manifests/compose/validation/vm/compose.yaml
+++ b/manifests/compose/validation/vm/compose.yaml
@@ -59,11 +59,11 @@ services:

   estimator:
     entrypoint:
-      - python3.8
+      - python3.10
     command:
       - -u
       - src/estimate/estimator.py
-    image: quay.io/sustainable_computing_io/kepler_model_server:v0.7.7
+    image: quay.io/sustainable_computing_io/kepler_model_server:v0.7.11

     volumes:
       - type: bind
@@ -78,13 +78,13 @@ services:

   model-server:
     entrypoint:
-      - python3.8
+      - python3.10
     ports:
-      - 8100
+      - '8100:8100'
     command:
       - -u
       - src/server/model_server.py
-    image: quay.io/sustainable_computing_io/kepler_model_server:v0.7.7
+    image: quay.io/sustainable_computing_io/kepler_model_server:v0.7.11
     volumes:
       - type: bind
         source: ./kepler/etc/kepler
diff --git a/manifests/compose/validation/vm/kepler/etc/kepler/kepler.config/MODEL_CONFIG b/manifests/compose/validation/vm/kepler/etc/kepler/kepler.config/MODEL_CONFIG
index c5afb71f..5b4821e0 100644
--- a/manifests/compose/validation/vm/kepler/etc/kepler/kepler.config/MODEL_CONFIG
+++ b/manifests/compose/validation/vm/kepler/etc/kepler/kepler.config/MODEL_CONFIG
@@ -1,4 +1,4 @@
 NODE_TOTAL_ESTIMATOR=true
 NODE_TOTAL_INIT_URL=https://raw.githubusercontent.com/sustainable-computing-io/kepler-model-db/main/models/v0.7/specpower/acpi/AbsPower/BPFOnly/GradientBoostingRegressorTrainer_0.zip
 NODE_COMPONENTS_ESTIMATOR=true
-NODE_COMPONENTS_INIT_URL=https://raw.githubusercontent.com/sustainable-computing-io/kepler-model-db/main/models/v0.7/ec2/intel_rapl/AbsPower/BPFOnly/GradientBoostingRegressorTrainer_0.zip
+NODE_COMPONENTS_INIT_URL=https://raw.githubusercontent.com/sustainable-computing-io/kepler-model-db/main/models/v0.7/ec2-0.7.11/rapl-sysfs/AbsPower/BPFOnly/GradientBoostingRegressorTrainer_0.zip

What did you expect to happen?

Pkl files should load without any warning.

How can we reproduce it (as minimally and precisely as possible)?

Run estimator to see that log report the specpower pkl files (generated using python 3.7) are incompatible with newer python.

fail to load pkl /mnt/download/acpi/AbsPower/platform.pkl: No module named 'sklearn.ensemble._gb_losses'

estimator-1  | set NODE_TOTAL_ESTIMATOR to true.
estimator-1  | set NODE_TOTAL_INIT_URL to https://raw.githubusercontent.com/sustainable-computing-io/kepler-model-db/main/models/v0.7/specpower/acpi/AbsPower/BPFOnly/GradientBoostingRegressorTrainer_0
.zip.
estimator-1  | set NODE_COMPONENTS_ESTIMATOR to true.
estimator-1  | set NODE_COMPONENTS_INIT_URL to https://raw.githubusercontent.com/sustainable-computing-io/kepler-model-db/main/models/v0.7/ec2-0.7.11/rapl-sysfs/AbsPower/BPFOnly/GradientBoostingRegres
sorTrainer_0.zip.
estimator-1  | clean socket
estimator-1  | get archived model
estimator-1  | get init url https://raw.githubusercontent.com/sustainable-computing-io/kepler-model-db/main/models/v0.7/specpower/acpi/AbsPower/BPFOnly/GradientBoostingRegressorTrainer_0.zip
estimator-1  | try getting archieved model from URL: https://raw.githubusercontent.com/sustainable-computing-io/kepler-model-db/main/models/v0.7/specpower/acpi/AbsPower/BPFOnly/GradientBoostingRegress
orTrainer_0.zip for AbsPower
estimator-1  | <Response [200]>
estimator-1  | load model from config:  /mnt/download/acpi/AbsPower
estimator-1  | fail to load pkl /mnt/download/acpi/AbsPower/platform.pkl: No module named 'sklearn.ensemble._gb_losses'
estimator-1  | /usr/local/lib/python3.10/site-packages/sklearn/base.py:376: InconsistentVersionWarning: Trying to unpickle estimator MaxAbsScaler from version 1.1.2 when using version 1.5.1. This migh
t lead to breaking code or invalid results. Use at your own risk. For more info please refer to:
estimator-1  | https://scikit-learn.org/stable/model_persistence.html#security-maintainability-limitations

Anything else we need to know?

No response

Kepler image tag

latest (but shouldn't matter if you used 0.7.11)

Deployment

Kepler model server image tag if deployed

0.7.11

Kepler estimator image tag if deployed

latest

Kepler online trainer image tag if deployed

Kepler offline trainer image tag if deployed

Kepler profiler image tag if deployed

Kubernetes version

```console $ kubectl version # paste output here ```

Install tools

Kepler deployment config

For on kubernetes: ```console $ KEPLER_NAMESPACE=kepler # provide kepler configmap $ kubectl get configmap kepler-cfm -n ${KEPLER_NAMESPACE} # paste output here # provide kepler model server configmap if Kepler Model Server is deployed $ kubectl get configmap kepler-model-server-cfm -n ${KEPLER_NAMESPACE} # paste output here # provide kepler deployment description $ kubectl describe deployment kepler-exporter -n ${KEPLER_NAMESPACE} ``` For standalone: # put your Kepler command argument here
sthaha commented 3 months ago

@sunya-ch , I think we should retrain spec-power to fix this error.

sunya-ch commented 2 months ago

Fixed. Refer to #367.