opensearch-project / ml-commons

ml-commons provides a set of common machine learning algorithms, e.g. k-means, or linear regression, to help developers build ML related features within OpenSearch.
Apache License 2.0
88 stars 123 forks source link

[FEATURE] Enhance the AI connector framework to support 1)Async Prediction and 2) Prediction with Streaming Response #2484

Open Zhangxunmt opened 1 month ago

Zhangxunmt commented 1 month ago

Is your feature request related to a problem?

Two enhancements are proposed in this feature to improve the Ml-Commons Connector framework.

  1. Currently in the connector framework, we only have one way to predict remote models in realtime mode through API calls. This realtime invocation cannot handle the batch inference as proposed in https://github.com/opensearch-project/ml-commons/issues/1840. One important pre-requisite of batch inference is offline Endpoint invocation asynchronously. In additional to Bedrock that has the async model prediction, SageMaker also provides the API to invoke endpoints in async mode (https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_runtime_InvokeEndpointAsync.html). Different than the Async client http connections, this Async Prediction usually require the request payloads and the responses stored in storage service like S3.
  1. As another improvement, we should also add the model invoking mode of streaming responses. Now SageMaker and OpenAI all support Model Predictions with streaming responses in https://platform.openai.com/docs/api-reference/streaming and https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_runtime_InvokeEndpointWithResponseStream.html. We should integrate them in the connector as new Action Types.

What solution would you like? Integrate the Async Invoke Model APIs from SageMaker and Bedrock. Integrate the Streaming payload response APIs from SageMaker, OpenAI, and make it general to support others.

What alternatives have you considered? The implementation should be done in a general way that is easily extended to new model sever platforms.

Do you have any additional context?

austintlee commented 1 month ago

For streaming - https://github.com/opensearch-project/OpenSearch/pull/13772