kubeflow / model-registry

Apache License 2.0
76 stars 42 forks source link

Ability to discover all running models #130

Open fbricon opened 3 months ago

fbricon commented 3 months ago

Is your feature request related to a problem? Please describe. From a tooling standpoint, we need the ability to discover all running LLM endpoints, so we can pick one and use it as an AI assistant in an IDE (using the continue.dev extension in VS Code/IntelliJ for instance)

Describe the solution you'd like The model endpoints should be listed with at least their label,type, and api url e.g.

Describe alternatives you've considered AFAIK, there's no other way to discover running inference engines at the moment.

cc @amfred

tarilabs commented 3 months ago

Thank you for this comment @fbricon !

Per the original proposal document for Model Registry, this is accounted, especially for audit purposes. As mentioned in the same document, this intent also protects against Model Registry becoming a controller of sort; which is not in scope.

We have the ServingEnvironment, InferenceService entities in the OpenAPI and mapped, for this scope.

I realize now:

Ultimately the "valid endpoints" source-of-truth depends on the serving runtime used. In the case of Kubeflow, KServe is a default add-on and the one we did some integrations for.

I realize now:

For context:

(edit: link fixup, typo fix)

lampajr commented 3 months ago

we could refactor it to be made available in this repo, similar to the CSI example @lampajr wdyt?

That could be a very useful example, we could create a custom folder that will contain a bare-minimal controller that implements that logic only.

Given that controllers/mr_inferenceservice_controller is already pretty much isolated I am not expecting too much effort

rareddy commented 1 month ago

A similar requirement came in for a possible integration with Backstage. I am not sure I understood the proposal above, is there a way to solve this for the Kubeflow offering without an operator? should we deploy yet another container alongside REST Server for this, I typically would like to see something working OOB rather than configuring something explicitly by the user.

tarilabs commented 1 month ago

thanks for looping this @rareddy ,

The requirements are more naturally and clearly emerging recently, here is what I captured so far:

Beyond these general requirements, we are also further clarified that:

Given the Architecture proposal advanced by @ederign in KF community meeting 2024-08-06 (mailing-list post), if you notice specifically slide 8, what is described there is exactly what the BFF serves the purpose for, but also as we discussed in the past.

So in conclusion, my recommendation is to tackle this capabilities in the Model Registry BFF, as that would be the most natural fit considering all the most recent directions. wdyt @rareddy @ederign @lampajr ?

btw @ederign assuming this, what would be the best way to formalize this BFF functionality/requirment, please?

ederign commented 1 month ago

@tarilabs, you are right. Having multiple clients consume our APIs is precisely one reason we designed the BFF. Having VS Code and Backstage consuming our BFF would be awesome.

@tarilabs Currently, we are planning to 'talk' with Kubernetes only to fetch the MR endpoint. After getting the MR endpoint, the BFF will do REST calls to the Model Registry REST API to do all operations/data that are currently needed in the MR Web UI.

I want to double-check if the requirements you described can be fulfilled by Model Registry REST API. Or would the BFF be required to 'talk' with another Kubeflow project (Kserve, perhaps) to provide all data needed for them?

If MR REST API can provide all the data needed, a good starting point for our discussion would be understanding the endpoints and JSON schema needed for backstage and VS Code. Then, we can check if there is an overlap with the APIs that we are currently planning for the Web UI or if we need a new endpoint. I'm happy to implement those in the community.

If Model Registry REST API cannot fulfill those requirements, the BFF will be required to 'talk' with other Kubeflow projects; I suggest we hold a design session to discuss the implications of this for our architecture (orthogonal use cases).

Either way, I'm working towards a PR to add Open API + Swagger definition for the current APIs. I'll send something this week!

@lucferbux @alexcreasy @Griffin-Sullivan ^

tarilabs commented 1 month ago

Or would the BFF be required to 'talk' with another Kubeflow project (Kserve, perhaps) to provide all data needed for them?

I just want to clarify I did not imply "talking to other projects", but Isvc resources, as-in Kubernetes resources.

i.e.: something like kubectl get isvc.

This is required for the R2 flow, and further to support a user story when a presently running model, to be catalog/index'd on Model Registry.

The rest sounds aligned to me, and happy to discuss live anytime!

ederign commented 1 month ago

I just had a quick call with @tarilabs, and we agree that BFF is the best option for this use case. So what we need to move forward is:

@fbricon @rareddy I believe a starting point for our discussion would be understanding the endpoints and JSON schema needed for backstage and VS Code. Then, we can check if there is an overlap with the APIs that we are currently planning for the Web UI or if we need a new endpoint. I'm happy to implement those in the community.

rareddy commented 1 month ago

@tarilabs I thought MR created InferenceService entities and with the above use of reconciling we are collating the deployment info which could then directly be exposed through MR REST API. Since we are going to do reconciler for StorageInitializer why not just use that?

I understand the BFF proposition, but thinking about how would external access to Backstage components need to deal with two different endpoints, security etc.

tarilabs commented 1 month ago

@tarilabs I thought MR created InferenceService entities and with the above use of reconciling we are collating the deployment info which could then directly be exposed through MR REST API.

To baseline the discussion:

With the above premised:

But trying to walk in those shoes anyway, even if we exploit the auditory logical model entries for the fresh snapshot purpose, it won't solve for the requirement of knowing Models deployed which are not indexed/catalogued in Model Registry.

For these reasons, I believe the BFF approach as I mentioned in https://github.com/kubeflow/model-registry/issues/130#issuecomment-2281361781 is to me the most appropriate.

To me, we need:

.

Since we are going to do reconciler for StorageInitializer why not just use that?

I'm not sure I understood this comment. CSI is not a reconcile loop in a operator/controller.

.

I understand the BFF proposition, but thinking about how would external access to Backstage components need to deal with two different endpoints, security etc.

This is a matter of Deployment model of BFF, and if it becomes "an issue" to me this would be a blocker well beyond backstage integration worth of being resolved fully.

.

Hope these are relevant comments for considerations, and hope putting them in writing was helpful but I expect this is a conversation easier to have also in the meetings!

fbricon commented 1 month ago

I'm not sure I understand who will be responsible for querying KServe's Isvc (kubectl get isvc). Will it be the model-registry, under the hood? Users? If the latter, my understanding (from discussions with @guimou), is those resources will most likely be under namespaces unlikely to be available to regular users

tarilabs commented 1 month ago

@fbricon this discussion is indeed to avoiding having to ask users to kubectl get isvc.

This discussion, as the comment are showing, is about implementation choices for how to do it within Model Registry scope, between what was recently presented (BFF) and previously available reconcile loop (intended for Auditing). Hope this clarifies.

ederign commented 1 month ago

@fbricon @rareddy, In short, the gist of what we are discussing is the BFF becoming the API for such services.

{VSCode/Backstage} => REST call => BFF = abstracts, coordinate and format data => {K8s resources | Model Registry APIs}

For sure, we will going to need to discuss security and other implications, but first, we need to agree if the BFF will be the 'API' for those external integrations.