mamurak / os-mlops

22 stars 18 forks source link

ModelMesh ArgoCD app does not sync #3

Open goern opened 1 year ago

goern commented 1 year ago

at https://github.com/goern/os-mlops/blob/odh-manifest-typo-fix/odh-kfp-modelmesh.md#prepare-the-model-deployment an ArgoCD app is deployed from https://github.com/mamurak/odh-ml-pipelines-seldon-ops-repo/blob/main/manifests/inference-service.yaml which errors with the server could not find the requested resource

is the modelmesh config missing from manifests/odh/odh.yaml ?

/assign @mamurak /kind bug

goern commented 1 year ago

after adding model mesh deployment (see https://github.com/goern/os-mlops/commit/af54feacbcacb5a8209bf904da882a9a42287725 ), argocd gives me

Failed sync attempt to 3e4e156472fe0beea36fc57bb76b449421b87900: one or more objects failed to apply, reason: inferenceservices.serving.kserve.io is forbidden: User "system:serviceaccount:odh-applications:argocd-argocd-application-controller" cannot create resource "inferenceservices" in API group "serving.kserve.io" in the namespace "odh-applications"
mamurak commented 1 year ago

Thanks for the heads up @goern. I can take a deeper look later, but from the error message you report, I suspect it tries to access a namespace that doesn't exist. Can you confirm that you're running ODH? In case of RHODS, the namespace will have to be updated to your existing target namespace.

goern commented 1 year ago

confirmed. I'm just following the steps of the readme ;)

To me, it looks like the argocd role is not allowed to create resources. It does not look like a problem with argocd requiring a white labeling of the resource

mamurak commented 1 year ago

I had a closer look and saw that the ArgoCD manifests had an inconsistency, which explains the failed GitOps deployment. I fixed that. However, with ModelMesh now being fully integrated in ODH and RHODS, the concept of GitOps within an ODH/RHODS based workflow needs to be reworked. I've removed that part from the documentation in the meantime.

I've updated major parts of the documentation. Please let me know if it makes sense now @goern.