Closed Anish701 closed 1 month ago
@Anish701 please create a service account in your project and let me know what it is, and I can then assign that service account the appropriate permissions.
Thank you so much @naved001! I just created a service account on the project named autopilot-service-acc using the default YAML file. Here is the link: https://console.apps.shift.nerc.mghpcc.org/k8s/ns/autopilot-dashboard-f3dc9e/serviceaccounts/autopilot-service-acc
We have the latest IBM Autopilot v1.9.0 Dashboard deployed in the OBS cluster and connected to the ACM metrics for our AI Clusters. We deployed IBM Autopilot in this PR.
@Anish701 the serviceaccount should now have the permissions to read nodes.
Thank you!
Awesome!!!
Hi @naved001 and @computate, I just wanted to ask if there is any route exposed for the Autopilot service or the Kubernetes API (see https://github.com/IBM/autopilot?tab=readme-ov-file#manually-query-the-autopilot-service). The service account doesn't have access to view existing routes, so I just wanted to check if there were any available endpoints which our app can connect to run Autopilot health checks or view nodes (through Kubernetes API).
Hi @Anish701 , the autopilot Service for our Prod cluster is actually available at this internal service URL autopilot-healthchecks.autopilot.svc:3333
. See if it works for you:
$ oc debug
$ curl http://autopilot-healthchecks.autopilot.svc:3333/status?check=dcgm&r=2
In the IBM Autopilot Dashboard project (autopilot-dashboard-f3dc9e), the dashboard application needs access to read worker nodes in order to pull and display their labels. To enable this, a service account with the appropriate permissions to read nodes and host the application will be required.
For reference, we have an existing service in the project which successfully hosts the application (but without read node access). We are open to modifying this existing service if that would be preferred over creating a service account.