Closed spolti closed 6 months ago
@spolti I followed the description and hit errors:
./deploy.sh: line 4: /tmp/modelmesh-serving/opendatahub/quickstart/scripts/utils.sh: No such file or directory
make: *** No rule to make target 'deploy-mm-for-odh'. Stop.
Warning: Immediate deletion does not wait for confirmation that the running resource has been terminated. The resource may continue to run on the cluster indefinitely.
No resources found
Error from server (NotFound): namespaces "model-serving" not found
Now using project "model-serving" on server "https://api.jooho.n1ai.p3.openshiftapps.com:443".
You can add applications to this project with the 'new-app' command. For example, try:
oc new-app rails-postgresql-example
to build a new example application in Ruby. Or use kubectl to deploy a simple Kubernetes application:
kubectl create deployment hello-node --image=registry.k8s.io/e2e-test-images/agnhost:2.43 -- /agnhost serve-hostname
error: the path "/tmp/modelmesh-serving/opendatahub/quickstart/basic/common_manifests/openvino-serving-runtime.yaml" does not exist
error: the path "/tmp/modelmesh-serving/opendatahub/quickstart/basic/common_manifests/openvino-inference-service.yaml" does not exist
./deploy.sh: line 28: wait_for_pods_ready: command not found
./deploy.sh: line 30: success: command not found
@spolti Basic folder is only one to test or are there any other folders?
@Jooho it should be working now, not sure why it happened, but was able to run previously locally.
now, it removed the modelmesh-serving folder itself.
[RHOAIENG-1051](https://issues.redhat.com//browse/RHOAIENG-1051) /tmp/modelmesh-serving/opendatahub/quickstart/basic 15:57:06 jooho
❯ ./deploy.sh
./opendatahub/scripts/install_odh.sh
.. Downloading binaries
.. Creating a bin folder
yq already installed.
Installing kustomize.
curl: (22) The requested URL returned error: 404
tar: /tmp/kustomize.tar.gz: Cannot open: No such file or directory
tar: Error is not recoverable: exiting now
mv: cannot stat '/tmp/kustomize': No such file or directory
rm: cannot remove '/tmp/kustomize.tar.gz': No such file or directory
installed kustomize version v5.3.0
Delete the exising /tmp/modelmesh-e2e folder
Creating a /tmp/modelmesh-e2e folder
Already on project "opendatahub" on server "https://api.jooho.n1ai.p3.openshiftapps.com:443".
.. Archiving odh-manifests
cp: cannot stat '/tmp/modelmesh-serving/opendatahub/scripts/../../config': No such file or directory
cp: cannot stat '/tmp/modelmesh-serving/opendatahub/scripts/manifests': No such file or directory
Latest Manifest will be used, fast tag
.. Deploying ModelMesh with kustomize
./opendatahub/scripts/install_odh.sh: line 158: /tmp/modelmesh-serving/config/overlays/odh/params.env: No such file or directory
params.env:
cat: /tmp/modelmesh-serving/config/overlays/odh/params.env: No such file or directory
installing namespaced rbac
2024/03/19 15:57:09 unable to make loader at '.'; not a valid directory: abs path error on '.' : getwd: no such file or directory
error: no objects passed to apply
cat: /tmp/modelmesh-serving/config/namespace-runtimes/kustomization.yaml: No such file or directory
2024/03/19 15:57:09 unable to make loader at '.'; not a valid directory: abs path error on '.' : getwd: no such file or directory
^Cmake: *** [Makefile:297: deploy-mm-for-odh] Interrupt
cd: error retrieving current directory: getcwd: cannot access parent directories: No such file or directory
cd: error retrieving current directory: getcwd: cannot access parent directories: No such file or directory
cd: error retrieving current directory: getcwd: cannot access parent directories: No such file or directory
cd: error retrieving current directory: getcwd: cannot access parent directories: No such file or directory
cd: error retrieving current directory: getcwd: cannot access parent directories: No such file or directory
cd: error retrieving current directory: getcwd: cannot access parent directories: No such file or directory
please double check rm
command.
/retest
@spolti With this pr, what should I test? 3 quickstarts ?
anything else?
only these 3 should be fine.
basic/hpa worked but pvc failed.
modelmesh-controller failed
{"level":"info","ts":"2024-04-02T20:12:53Z","logger":"setup","msg":"MMesh Configuration","serviceName":"modelmesh-serving","port":8033,"mmeshEndpoint":""}
{"level":"error","ts":"2024-04-02T20:12:53Z","logger":"setup","msg":"In cluster scope mode but controller does not have cluster scope permissions, exiting","stacktrace":"main.main\n\t/go/src/github.com/opendatahub-io/modelmesh-serving/main.go:210\nruntime.main\n\t/usr/lib/golang/src/runtime/proc.go:250"}
Is this quickstart still deploying model-mesh?
The current issue with these set of quickstarts, in my opinion, is that the deploy.sh
script looks to install model-mesh, which is not what I have found I needed whenever I looked to these quickstarts. Most of the time I find myself looking at these quickstarts, I already have an ODH installation, and what I'm looking for is a 1 line command to deploy a sample inferenceservice.
IMO, these quickstarts should operate under the assumption that ODH is already installed, and only focus on deploying the models and performing inference. Or, at least there should be an option for this in addition to what already exists.
WDYT @Jooho @israel-hdez @spolti @danielezonca
FVT is working fine, just one test is failing, seems to be a issue with it:
Scaling of runtime deployments with HPA Autoscaler when there are no predictors Scale all runtimes down after a created test predictor is deleted
/Users/fspolti/data/dev/sources/modelmesh-serving/fvt/hpa/hpa_test.go:149
2024-04-05T17:45:14-03:00 INFO Delete all predictors ...
2024-04-05T17:45:17-03:00 INFO Watcher got event with object {"name": "modelmesh-serving-mlserver-1.x", "replicas": 0, "available": 0, "updated": 0}
2024-04-05T17:45:17-03:00 INFO deployStatusesReady: map[modelmesh-serving-mlserver-1.x:true modelmesh-serving-ovms-1.x:false modelmesh-serving-torchserve-0.x:false modelmesh-serving-triton-2.x:false]
2024-04-05T17:45:17-03:00 INFO Watcher got event with object {"name": "modelmesh-serving-ovms-1.x", "replicas": 0, "available": 0, "updated": 0}
2024-04-05T17:45:17-03:00 INFO deployStatusesReady: map[modelmesh-serving-mlserver-1.x:true modelmesh-serving-ovms-1.x:true modelmesh-serving-torchserve-0.x:false modelmesh-serving-triton-2.x:false]
2024-04-05T17:45:17-03:00 INFO Watcher got event with object {"name": "modelmesh-serving-torchserve-0.x", "replicas": 0, "available": 0, "updated": 0}
2024-04-05T17:45:17-03:00 INFO deployStatusesReady: map[modelmesh-serving-mlserver-1.x:true modelmesh-serving-ovms-1.x:true modelmesh-serving-torchserve-0.x:true modelmesh-serving-triton-2.x:false]
2024-04-05T17:45:17-03:00 INFO Watcher got event with object {"name": "modelmesh-serving-triton-2.x", "replicas": 0, "available": 0, "updated": 0}
2024-04-05T17:45:17-03:00 INFO deployStatusesReady: map[modelmesh-serving-mlserver-1.x:true modelmesh-serving-ovms-1.x:true modelmesh-serving-torchserve-0.x:true modelmesh-serving-triton-2.x:true]
2024-04-05T17:45:17-03:00 INFO All deployments are ready: map[modelmesh-serving-mlserver-1.x:true modelmesh-serving-ovms-1.x:true modelmesh-serving-torchserve-0.x:true modelmesh-serving-triton-2.x:true]
2024-04-05T17:45:27-03:00 INFO Timed out after 10s without events
STEP: Creating a test predictor for one Runtime @ 04/05/24 17:45:27.361
STEP: Creating predictor mlserver-sklearn-mnist-svm-gpqgm @ 04/05/24 17:45:27.361
STEP: Waiting for predictor mlserver-sklearn-mnist-svm-gpqgm to be 'Loaded' @ 04/05/24 17:45:27.665
2024-04-05T17:45:27-03:00 INFO Watcher got event with object {"name": "mlserver-sklearn-mnist-svm-gpqgm", "status.available": false, "status.activeModelState": "Pending", "status.targetModelState": "", "status.transitionStatus": "UpToDate", "status.lastFailureInfo": null}
2024-04-05T17:45:27-03:00 INFO Watcher got event with object {"name": "mlserver-sklearn-mnist-svm-gpqgm", "status.available": false, "status.activeModelState": "Pending", "status.targetModelState": "", "status.transitionStatus": "UpToDate", "status.lastFailureInfo": {"message":"Waiting for runtime Pod to become available","modelId":"mlserver-sklearn-mnist-svm-gpqgm__ksp-b20a0c5aca","reason":"RuntimeUnhealthy"}}
[FAILED] in [It] - /Users/fspolti/data/dev/sources/modelmesh-serving/fvt/helpers.go:355 @ 04/05/24 17:47:27.664
2024-04-05T17:47:27-03:00 INFO Running command {"args": "kubectl get predictors -n model-serving"}
=====================================================================================================================================
NAME TYPE AVAILABLE ACTIVEMODEL TARGETMODEL TRANSITION AGE
mlserver-sklearn-mnist-svm-gpqgm sklearn false Pending UpToDate 2m3s
Is this quickstart still deploying model-mesh? The current issue with these set of quickstarts, in my opinion, is that the
deploy.sh
script looks to install model-mesh, which is not what I have found I needed whenever I looked to these quickstarts. Most of the time I find myself looking at these quickstarts, I already have an ODH installation, and what I'm looking for is a 1 line command to deploy a sample inferenceservice. IMO, these quickstarts should operate under the assumption that ODH is already installed, and only focus on deploying the models and performing inference. Or, at least there should be an option for this in addition to what already exists.WDYT @Jooho @israel-hdez @spolti @danielezonca
Note that I haven't reviewed the code and I think I don't have full context. So, just trying to answer...
In most projects I have played with, quickstarts assume a clean environment and install everything to quickly give you a working setup. Also, usually quickstarts are for trying the project (i.e. non-production, demos). Because of this, a quickstart don't let you customize the setup (that's left to the official installer).
I would agree that deploying a sample model should be left to a different script than the quickstart setup, although it could be part of the same doc page (probably arranged like a tutorial).
That said, I also didn't use too much the quickstarts because what I personally remember is that, rather than providing you with a setup that you can play with, I think it prepared the env more like a demo that is also suited for running FVTs/CI... and I had to spend time "cleaning" the env. This is different from your case: you just want to deploy a sample model on an existent setup, while I wanted to quickly have a base setup (without additional stuff) to try my own models. ... but don't trust me about this (I may be remembering incorrectly why I didn't use the quickstarts that much).
[APPROVALNOTIFIER] This PR is APPROVED
This pull-request has been approved by: Jooho, spolti
The full list of commands accepted by this bot can be found here.
The pull request process is described here
chore: opendatahub folder have
How to test:
PR checklist
Checklist items below are applicable for development targeted to both fast and stable branches/tags
Checklist items below are applicable for development targeted to both fast and stable branches/tags