kubeflow / katib

Automated Machine Learning on Kubernetes
https://www.kubeflow.org/docs/components/katib
Apache License 2.0
1.48k stars 439 forks source link

[SDK] Create API to get Trial metrics from Katib DB #2022

Closed andreyvelich closed 1 year ago

andreyvelich commented 1 year ago

/kind feature /area sdk

Our Katib Python SDK doesn't have an API to get Trial metrics from Katib DB. Currently, user can see the Trial metrics only using Katib UI. We should give an ability to query metrics using GetObservationLog gRPC API via Katib SDK.

From the security perspective user can run this gRPC API from any namespace and any experiment since our DB Manager doesn't have any auth checks, right ? Should we investigate how to improve user isolation for Katib ("multi-user mode feature") ? One solution could be to use Istio to allow traffic only from the appropriate user, as @apo-ger mentioned here: https://github.com/kubeflow/katib/pull/1983#issuecomment-1319674570.

What do you think @johnugeorge @gaocegege @tenzen-y @anencore94 @kimwnasptd @apo-ger ?


Love this feature? Give it a 👍 We prioritize the features with the most 👍

kimwnasptd commented 1 year ago

@andreyvelich that's a great feature!

Regarding the authnz part, I think this discussion will revolve around having programmatic client support for the DB Manager API Server. This is the same with how KFP allows Pods from other namespaces to use its API Server to perform CRUD tasks https://github.com/kubeflow/pipelines/issues/5138.

And this is done by:

  1. Allowing everyone to talk to the DB Manager, but without setting the kubeflow-userid header (to avoid impersonations).
  2. The DB Manager will drop any requests that are not authenticated
  3. In-cluster pods that will need to talk to the DB Manager will need to provide an audience scoped ServiceAccount token
  4. The DB Manager will need to validate the token
  5. The DB Manager will then extract the identity (ServiceAccount name) from that token and perform a SubjectAccessReview

Then there's also the discussion on how to use the ServiceAccount tokens from outside the cluster. But this is a next step once we have the above in-cluster behavior working

andreyvelich commented 1 year ago

This is the same with how KFP allows Pods from other namespaces to use its API Server to perform CRUD tasks https://github.com/kubeflow/pipelines/issues/5138.

Thanks for sharing this @kimwnasptd. On the recent Kubeflow summit we also got questions will Katib SDK have the same auth: https://kubeflow.slack.com/archives/C046YTDRABW/p1666199636566639. Also, maybe we should authenticate all gRPC calls in Katib using ServiceAccountToken as you suggested or/and all gRPC request should go through proxy (e.g. Katib API Server) to verify the requests.

I guess, currently users can call GetSuggestions in all Kubernetes namespaces where Katib Experiment is running (similar problem with the Trial metrics). Or any other gRPC APIs that we have.

I think, we should have broader discussion in the Kubeflow community how to keep the same security best practice for our various components (e.g. Pipelines, Katib).

cc @kubeflow/wg-training-leads @tenzen-y @anencore94

johnugeorge commented 1 year ago

@kimwnasptd @andreyvelich We need to think about external access as well for this feature. If it just works for in-cluster requests, it will not be a good value add for the SDK. Adding to https://github.com/kubeflow/katib/issues/2022#issuecomment-1320231727, we haven't discussed the right design for SDKs in KF. Each project handles in a a different way(Pipelines, Kserve etc)

gaocegege commented 1 year ago

We need to think about external access as well for this feature. If it just works for in-cluster requests, it will not be a good value add for the SDK.

I think so.

tenzen-y commented 1 year ago

Also, maybe we should authenticate all gRPC calls in Katib using ServiceAccountToken as you suggested or/and all gRPC request should go through proxy (e.g. Katib API Server) to verify the requests.

@andreyvelich It sounds good.

BTW, I have a question. In the case of gRPC calls between katib-components (e.g. metrics-collector <-> katib-db-manager), will the gRPC request through the katib API server for authentication? Or is that request direct access to the katib-db-manager, the same as always?

andreyvelich commented 1 year ago

In the case of gRPC calls between katib-components (e.g. metrics-collector <-> katib-db-manager), will the gRPC request through the katib API server for authentication? Or is that request direct access to the katib-db-manager, the same as always?

@tenzen-y Since Metrics Collector is running on the user profile side, I guess we should have some sort of authentication. From my understanding, currently any Kubeflow user can run our gRPC APIs to report/delete/get any logs from the DB: https://github.com/kubeflow/katib/blob/master/pkg/apis/manager/v1beta1/api.proto#L18-L28.

I will start with the simple API to get Trial metrics from the DB using SDK. We can think about proper auth in the following discussions. /assign @andreyvelich

tenzen-y commented 1 year ago

In the case of gRPC calls between katib-components (e.g. metrics-collector <-> katib-db-manager), will the gRPC request through the katib API server for authentication? Or is that request direct access to the katib-db-manager, the same as always?

@tenzen-y Since Metrics Collector is running on the user profile side, I guess we should have some sort of authentication. From my understanding, currently any Kubeflow user can run our gRPC APIs to report/delete/get any logs from the DB: https://github.com/kubeflow/katib/blob/master/pkg/apis/manager/v1beta1/api.proto#L18-L28.

I will start with the simple API to get Trial metrics from the DB using SDK. We can think about proper auth in the following discussions. /assign @andreyvelich

@andreyvelich Thanks for clarifying!

anencore94 commented 1 year ago

https://docs.google.com/document/d/1TRUKUY1zCCMdgF-nJ7QtzRwifsoQop0V8UnRo-GWlpI/edit?disco=AAAAknO9PlM

For answering the above question, @andreyvelich . I've seen many company make their own UI page using several kubeflow APIs including kubeflow notebooks, pipelines and katib. Thus if there is a http server for katib, many clients including there own sdk and ui will use those APIs much easier

github-actions[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

andreyvelich commented 1 year ago

Let's close this issue, we can track the multi-user support for Katib DB manager in separate issues.