kserve / modelmesh-serving

Controller for ModelMesh
Apache License 2.0
189 stars 106 forks source link

Install fails if latest commit short hash can be parsed as an int #472

Closed ckadner closed 5 months ago

ckadner commented 5 months ago

Capturing this an installation error in an issue as it was a bit tricky to debug.

Error: accumulating resources: accumulation err='accumulating resources from '../manager': '/home/runner/work/modelmesh-serving/modelmesh-serving/config/manager' must resolve to a file': couldn't make target for path '/home/runner/work/modelmesh-serving/modelmesh-serving/config/manager': invalid Kustomization: json: cannot unmarshal number into Go struct field Image.images.newTag of type string
error: no objects passed to apply
Error: Process completed with exit code 1.

Core of the error message:

cannot unmarshal number into Go struct field Image.images.newTag of type string

How to reproduce it?

Assume the latest git commit short hash is composed of numeric characters exclusively, i.e.

    IMAGE_TAG: 7284872

Run:

  export NAMESPACE_SCOPE_MODE=false
  export NAMESPACE_SCOPE_MODE=false
  kubectl create ns modelmesh-serving
  ./scripts/install.sh --namespace modelmesh-serving --fvt --dev-mode-logging

Output:

namespace/modelmesh-serving created
Setting kube context to use namespace: modelmesh-serving
Context "minikube" modified.
Getting ModelMesh Serving configs
Using config directory at root of project.
~/work/modelmesh-serving/modelmesh-serving/config/default ~/work/modelmesh-serving/modelmesh-serving/config
~/work/modelmesh-serving/modelmesh-serving/config
~/work/modelmesh-serving/modelmesh-serving/config/rbac/namespace-scope ~/work/modelmesh-serving/modelmesh-serving/config
~/work/modelmesh-serving/modelmesh-serving/config
~/work/modelmesh-serving/modelmesh-serving/config/rbac/cluster-scope ~/work/modelmesh-serving/modelmesh-serving/config
~/work/modelmesh-serving/modelmesh-serving/config
Deploying fvt resources for etcd and minio
service/etcd created
deployment.apps/etcd created
secret/model-serving-etcd created
service/minio created
deployment.apps/minio created
secret/storage-config created
persistentvolumeclaim/models-pvc-1 created
persistentvolumeclaim/models-pvc-2 created
persistentvolumeclaim/models-pvc-3 created
job.batch/pvc-init created
pod/pvc-reader created
Waiting for dependent pods to be up ...
Pods found with selector '-l app=etcd' are not ready yet. Waiting 10 secs ...
All -l app=etcd pods are running and ready.
Pods found with selector '-l app=minio' are not ready yet. Waiting 10 secs ...
All -l app=minio pods are running and ready.
model-serving-etcd secret found
Creating storage-config secret if it does not exist
NAME             TYPE     DATA   AGE
storage-config   Opaque   1      20s
Installing ModelMesh Serving RBACs (namespace_scope_mode=false)
serviceaccount/modelmesh created
serviceaccount/modelmesh-controller created
role.rbac.authorization.k8s.io/modelmesh-controller-leader-election-role created
role.rbac.authorization.k8s.io/modelmesh-controller-restricted-scc-role created
clusterrole.rbac.authorization.k8s.io/modelmesh-controller-role created
rolebinding.rbac.authorization.k8s.io/modelmesh-controller-leader-election-rolebinding created
rolebinding.rbac.authorization.k8s.io/modelmesh-controller-restricted-scc-rolebinding created
clusterrolebinding.rbac.authorization.k8s.io/modelmesh-controller-rolebinding created
networkpolicy.networking.k8s.io/modelmesh-controller created
networkpolicy.networking.k8s.io/modelmesh-runtimes created
networkpolicy.networking.k8s.io/modelmesh-webhook created
Installing ModelMesh Serving CRDs and controller
Enabled Self Signed CA: Update manifest
secret/modelmesh-webhook-server-cert created

kustomize build default

Error: accumulating resources: accumulation err='accumulating resources from '../manager': '/home/runner/work/modelmesh-serving/modelmesh-serving/config/manager' must resolve to a file': couldn't make target for path '/home/runner/work/modelmesh-serving/modelmesh-serving/config/manager': invalid Kustomization: json: cannot unmarshal number into Go struct field Image.images.newTag of type string
error: no objects passed to apply
Error: Process completed with exit code 1.

The root cause of the error was in the sed command (see the swapped quotes in the before/after in the code block below).

sed -i.bak 's/newTag:.*$/newTag: '"$GIT_COMMIT_SHORT"'/' config/manager/kustomization.yaml. # before
sed -i.bak 's/newTag:.*$/newTag: "'${GIT_COMMIT_SHORT}'"/' config/manager/kustomization.yaml # after

That sed command (or variations of it) can be found in various places. There is a fix in PR #473, but in case I missed one, or, if a similar one may get added in the future, I created this issue to save future developers some time.

_Originally posted by @ckadner in https://github.com/kserve/modelmesh-serving/pull/464#discussion_r1442361903_

ckadner commented 5 months ago

When the short hash of the most recent commit has no non-numeric characters (e.g. 7284872) and gets used as the newTag for building and installing the latest modelmesh-controller image, then Kubernetes interprets the value of the newTag field as an int, rather than a string.

See this sed command in one of the install scripts

sed -i.bak 's/newTag:.*$/newTag: '"$GIT_COMMIT_SHORT"'/' config/manager/kustomization.yaml

This will cause this error during the installation (i.e. when running FVT on IKS)

Error: accumulating resources from '../manager':
  '/config/manager' must resolve to a file':
    couldn't make target for path '/config/manager':
      invalid Kustomization:
        json: cannot unmarshal number into Go struct field
          Image.images.newTag of type string
error: no objects passed to apply

Check the latest git commit:

git log -1 --format=%h --abbrev=7
    7284872

A git commit short hash with exclusively numeric characters happened 7 times in the past (as of 2023/12/02):

git log --format=%h --abbrev=7 | grep -E "^[0-9]+$"
    7284872
    4808804
    4746079
    0281170
    1631706
    5293579
    4207065