nerc-project / operations

Issues related to the operation of the NERC OpenShift environment
2 stars 0 forks source link

IBM Autopilot Dashboard - Service Account with Access to Read Nodes #774

Closed Anish701 closed 1 month ago

Anish701 commented 1 month ago

In the IBM Autopilot Dashboard project (autopilot-dashboard-f3dc9e), the dashboard application needs access to read worker nodes in order to pull and display their labels. To enable this, a service account with the appropriate permissions to read nodes and host the application will be required.

For reference, we have an existing service in the project which successfully hosts the application (but without read node access). We are open to modifying this existing service if that would be preferred over creating a service account.

naved001 commented 1 month ago

@Anish701 please create a service account in your project and let me know what it is, and I can then assign that service account the appropriate permissions.

Anish701 commented 1 month ago

Thank you so much @naved001! I just created a service account on the project named autopilot-service-acc using the default YAML file. Here is the link: https://console.apps.shift.nerc.mghpcc.org/k8s/ns/autopilot-dashboard-f3dc9e/serviceaccounts/autopilot-service-acc

computate commented 1 month ago

We have the latest IBM Autopilot v1.9.0 Dashboard deployed in the OBS cluster and connected to the ACM metrics for our AI Clusters. We deployed IBM Autopilot in this PR. Image

naved001 commented 1 month ago

@Anish701 the serviceaccount should now have the permissions to read nodes.

Anish701 commented 1 month ago

Thank you!

cmisale commented 1 month ago

Awesome!!!

Anish701 commented 1 month ago

Hi @naved001 and @computate, I just wanted to ask if there is any route exposed for the Autopilot service or the Kubernetes API (see https://github.com/IBM/autopilot?tab=readme-ov-file#manually-query-the-autopilot-service). The service account doesn't have access to view existing routes, so I just wanted to check if there were any available endpoints which our app can connect to run Autopilot health checks or view nodes (through Kubernetes API).

computate commented 3 weeks ago

Hi @Anish701 , the autopilot Service for our Prod cluster is actually available at this internal service URL autopilot-healthchecks.autopilot.svc:3333. See if it works for you:

$ oc debug
$ curl http://autopilot-healthchecks.autopilot.svc:3333/status?check=dcgm&r=2