Closed Zhangxunmt closed 5 months ago
Hey all! My team at Aryn noticed some recent development on the “feature/remote-inference” branch of ml-commons for a ChatConnector and query executor that go beyond the RFC published. We are also working to enable conversational applications with OpenSearch in a similar way. Our approach was to create and open source plug-ins and search pipelines, and it seems like now would be the right time to converge and work together on an approach and discuss the primitives. We couldn’t find a RFC for this work, and we would love to collaborate on the next steps and share our approach. We’ve started work on our own RFC for this functionality, and can share our thoughts in advance of publishing it. Does anyone know the developers working on this? I’d also like to kick off a quick sync to chat more - who else would be interested in joining? LMK.
@jonfritz It's great to hear that you are building some cool plugins for search pipeline and conversational apps. Actually they are all in our roadmap this year. We are planning to build a new plugin based on this remote inference feature which is dedicated to handle conversational requests for customers using generative AI. I think it's quite possible that our approaches are mergeable to some degree. Can you please include our product manager @dylan-tong-aws and Sr SDE @ylwu-amzn in the sync up chat meeting?
Thanks Xun! A set of us chatted yesterday (with representatives from AWS), and reposting the next steps here that I added in the Slack channel: "Thanks folks for getting together yesterday to discuss approaches for conversational search. The next step from the call is that Ben or Austin will submit the RFC for using plug-ins and search pipelines to enable conversational search (using pluggable generative AI models) and “conversational memory” (a way to create, store, and add interactions to a conversation). This will be submitted in the next few days, and then let’s give some good feedback. Looking forward to the collaboration in building this functionality for OpenSearch customers!"
It will be great to get your, @dylan-tong-aws and @ylwu-amzn feedback on the RFC once it's posted, so we can have the community align on the approach to take for OpenSearch's conversational interface.
Hi @jonfritz, yes sure. Please share the RFC once it's published. I will organize our team to take a look and provide feedbacks!
@Zhangxunmt here you go - https://github.com/opensearch-project/ml-commons/issues/1150.
Also, @Zhangxunmt, with regards to "quite possible that our approaches are mergeable to some degree" - let's use the RFC to align on the way the OpenSearch community wants to architect this functionality, and iterate on it together using that mechanism. Let's make sure it meets the use cases you had in mind as well, and take one approach for the project.
While working on #1150 (PR #1195), one thing I considered is to consume an HttpConnector to invoke OpenAI APIs directly without going through a remote model. Have you guys considered this approach? One benefit to this approach is that you don't have to rely on ML nodes just to be able to make calls to remote inference endpoints. What do you guys think about this approach?
@ylwu-amzn
While working on #1150 (PR #1195), one thing I considered is to consume an HttpConnector to invoke OpenAI APIs directly without going through a remote model. Have you guys considered this approach? One benefit to this approach is that you don't have to rely on ML nodes just to be able to make calls to remote inference endpoints. What do you guys think about this approach?
Yes, we considered this option. We considered several other things like security, downstream impact, we decided to use remote model by leveraging the current model management framework.
you don't have to rely on ML nodes just to be able to make calls to remote inference endpoints
This concern has been addressed in https://github.com/opensearch-project/ml-commons/pull/1197
I do like that plugins.ml_commons.task_dispatcher.eligible_node_role.remote_model
and plugins.ml_commons.task_dispatcher.eligible_node_role.local_model
have reasonable/sensible defaults. But I worry that you are introducing way too many knobs. I don't think that's justified just to force remote models to fit into the mold of local models.
There are performance and scale considerations a cluster admin needs to make when hosting (large) models locally. Sure, let's give them all the knobs they need to ensure these models don't bring down the cluster and impact non-ML workloads. But why do they have to be burdened with these superficial knobs for remote models that do not have any resource contention implications?
The description of these controls in #1197 comes across as if we are patching holes as we go. It would be nice to see these decisions being backed by customer feedback and use cases.
I don't think that's justified just to force remote models to fit into the mold of local models.
Not quite get you, you can see we adding these settings to avoid "force remote models to fit into the mold of local models".
But why do they have to be burdened with these superficial knobs for remote models that do not have any resource contention implications?
Remote model is not free, they does consume resources. These will help if user want to only run remote model on ML nodes. Users can tune by themselves.
It would be nice to see these decisions being backed by customer feedback and use cases.
I don't think we should always wait for customer feedback/user-cases to build features. cc @dylan-tong-aws, do you have any customer feedback or use cases?
I don't think we should always wait for customer feedback/user-cases to build features.
My 2 cents: In most cases, it's good to work backwards from customers and use cases for new features. Otherwise, we'll be at risk of adding more knobs or complexity with minimal benefit. Interested to hear the customer-led insights driving these additions.
Let's have a meeting to discuss. There aren't supposed to be many knobs exposed to the user who provision connectors (admin or mlops/infra engineer). There are [two personas](https://github.com/opensearch-project/ml-commons/issues/881) for this extensibility framework, and we need to work on making that distinction clear. One persona is the integrator. This is an SDE that represents some technology provider. They need enough flexibility to describe an integration (blueprint) between OpenSearch and an external service via RESTful APIs. The blueprint should be designed in a way that the admin or MLOps engineer who provisions the connector is only exposed to a few configurations. CloudFormation is a good analogy. Think about the CloudFormation template developer versus an ops engineer who uses the template. Right now, our APIs expose the blueprint details, but I am advocating for to have these APIs refactored or overloaded so that they don't expose all the knobs intended for integrators.
We had internal discussions about an API to publish blueprints. In the future, we will have certified connector blueprints, which will be pre-installed.
An admin should be able to provision a connector like this:
POST /_plugins/_ml/connectors/_create
{ connector_blueprint_id: sagemaker_connctor, region: us-west2, end_point: lmi-model-2023-06-24-01-35-32-275 iam_role or access keys: xxxxxxxxxx }
An admin only needs to be exposed to user inputs required at provision or invocation time. Credentials are something that can be set when a connector is provisioned or updated. There are use cases where we might want to provide the ability for a parameter to be overridden at invocation time. For instance, for users that are using Amazon SageMaker multi-model endpoints, they should be able to provision one connector to back multiple OpenSearch managed (external) models. Amazon SageMaker needs a model identifier/name to route a request to the appropriate model being served on one model. Being able to specify a parameter at the model level that can be passed to a shared connector at invocation time makes it easy to support this use case.
We're actively discussing the next phase of enhancements, and these are among them.
I don't think that's justified just to force remote models to fit into the mold of local models.
Not quite get you, you can see we adding these settings to avoid "force remote models to fit into the mold of local models".
But why do they have to be burdened with these superficial knobs for remote models that do not have any resource contention implications?
Remote model is not free, they does consume resources. These will help if user want to only run remote model on ML nodes. Users can tune by themselves.
It would be nice to see these decisions being backed by customer feedback and use cases.
I don't think we should always wait for customer feedback/user-cases to build features. cc @dylan-tong-aws, do you have any customer feedback or use cases?
@ylwu-amzn , @austintlee, I've requested to setup a meeting over Slack. The specific configurations that @austintlee out were technical decisions. There were no explicit customer/business requirements for these knobs. Let's meet to discuss the concerns and the technical decision to expose these configurations.
With that said, we do have user/business requirements to ensure this framework is cost optimized. This feature should not, for instance, have dependencies on ml nodes. Currently, one could use a ml node as a proxy, but that should not be required. In fact, I advocate we disable this because there are no user requirements or known use cases that require this. There are also cluster settings like "plugins.ml_commons.only_run_on_ml_node: true" that need to be decoupled from external models. We're actively working on this. A user should not have to set this to false to use the connectors and external models.
Hi @austintlee @Zhangxunmt
I tried to load the hugging face open source model and deployed it, It got deployed. I have deployed opensource hugging face gpt-2 model. I am following this documentation: Documentation Link
So when I am trying to create a Search pipeline for connector model I am not able to do so and getting error in response,
I have talked about this issue in the Opensearch forum also but didnt got any response regarding this issue The link to my issue is: Open search issue Link
I have also enabled plugins.ml_commons.rag_pipeline_feature_enabled still the issue is existing.
Any suggestion on this issue would be really appreciated
Problem Statement
ML Commons for OpenSearch eases the development of machine learning features by providing a set of common machine learning (ML) algorithms through transport and REST API calls. Currently ML Commons support several build-in models like KNN, Linear Regression, etc and custom models uploaded by users.
As a complement to the current ML model serving framework, we want to allow customers to use their choice of ML tech like OpenAI, Amazon SageMaker Hosting, Kubeflow KServe, Tensorflow Serving and NVIDIA’s Triton Inference Serve, and empower ML technology providers to integrate their technology with OpenSearch via a low-to-no-code experience and join an open ecosystem that empowers builders to create AI-powered apps faster.
We are trying to resolve the following problems.
Innovation velocity: there are so many mature and rapidly evolving model serving technologies and groundbreaking ML capabilities that are democratized exclusively through ML APIs and services. We want to let users select the best technology available to them and benefit from features that might not be natively available on OpenSearch.
Ease of adoption: many users have already adopted or built their own ML platform. We want to let those users leverage their existing investments and approved technologies.
Facilitating an open ecosystem: we need an easier way for partners and community contributors to integrate ML technologies with OpenSearch. As an open and community-driven platform, it’s important for us to empower contributors to co-innovate and drive joint-GTM motions. We want to provide integrators with a solution that ensures their engineering investments have a low cost of failure and high ROI potential.
What is the developer experience going to be?
The developer within the context of this framework is someone who is building an integration on behalf of a model serving technology or API. The integrator creates a connector blueprint for a service like the OpenAI ChatGPT API or Amazon SageMaker Hosting Services by defining a blueprint (eg. JSON document) that describes a protocol that OpenSearch can use to communicate with an external ML model service.
More details on the blueprint spec and APIs are provided in the Connector Blueprint Section.
Sample Use Cases and workflow
There are three user types: admin, integrator(developer), and end users. Integrators or developers are the active community contributors who train and deploy models with an external model server, and provision connectors within OpenSearch to enable an integration with the remote model. Integrators can also publish the validated connectors as a Json document to 1) An OpenSearch repository that can be later downloaded by end users from OpenSearch Website, and 2) the local ml-connector index included in the OpenSearch distributions so any end user will directly use it, e.g. a certified SageMaker connector to run a NLP model. End users are the people and system that run queries that require remote model inference. Admin is the owner of OS domain who defines permissions and give proper permissions to developers and end users.
If the target model hasn’t already been deployed and published, the integrator will deploy the model on the model server technology the connector was designed to support.
The integrator can choose to publish the work as a community connector or a certified connector as described in the feature brief.
The admin/integrator can deploy connectors into OpenSearch Ml-common from multiple sources.
Once the remote model connector is created and deployed, end users can create virtual models inside ml-common to run remote inference by calling the remote server. The virtual model shares the current ML-Model structure and is stored in the existing ML-Model index. It does not contain physical model content but hosts all the model metas. Multiple virtual models could be created and associated with the same connector. Ml-common will provide a new “Create Model” API to create virtual models.
End-users use the existing “predict model” APIs to run remote inference and the CRUD APIs to manage virtual models, including search models. Deleting or updating a virtual model does not necessary mean deleting associated connector.
Ml-common will provide a new set of APIs to manage model connectors. The OpenSearch admins can run the APIs to view the active connectors, check the connector status, and update/search/delete connectors. These new APIs details are listed in the Rest APIs section of this design doc.
Proposed Solution
We allow customers to define a connector blueprint to connect to any model serving framework. Once the blueprint is created, the user can use the blueprint to provision a connector to enable secure communication within an OpenSearch cluster and your service/api. There will be a CRUD API for connectors in ML-Commons, and a new system index is created to store and manage the connectors. The blueprint is parametric and generalized enough that ML-Commons can parse it in a way that customers can create new connectors to their favored AI model by simply configuring a blueprint, achieving a low-to-no-code user experience.
To run a remote inference, the user needs to define a model which has the connector ID that you want to connector for remote inference. We name these model as "virtual models" in ml-common and they share the current model management API (i.e. upload, train, delete, inference) with other build-in physical models. Invoking the "Predict" API against a virtual model will literally run a remote inference to the remote server through the associated connector.
Connector Blueprint/Template Definition
For the request to create a new remote server connector in ml-commons plugin, users needs to provide a connector blueprint in the restful “Create Connector” API. The feature brief has provided a high level idea of the connector blueprint that should be general and parametric enough to support all model serving frameworks. To be more specific here, we will use a nested JSON template to define a connector. In a connector blueprint, there are two types of placeholders:
The following is the proposed blueprint spec. This what the partner is responsible for defining to create the integration.
Remote Inference Example
Using OpenAI as the example, creating an OpenAI connector will look like
Once the connector is active, users can create “remote models” through our existing model management APIs and perform inference as follows.
Two options are provided to invoke the “Predict” API for remote inference.
Requested Feedback
We appreciate any and all feedback the community has.
Specifically, we are particularly interested in information around the following topics