opensearch-project / ml-commons

ml-commons provides a set of common machine learning algorithms, e.g. k-means, or linear regression, to help developers build ML related features within OpenSearch.
Apache License 2.0
96 stars 135 forks source link

[FEATURE] Check downstream services before deleting MLModel #3087

Open yuye-aws opened 4 weeks ago

yuye-aws commented 4 weeks ago

Is your feature request related to a problem? There are several types of downstream services dependent on MLModel. For example, several tools in agent framework (MLModelTool, VectorDBTool and RAGTool) uses MLModel. When deleting the MLModel, we need to first check whether any of the downstream service is using this MLModel.

What solution would you like? Just like connector, it will return an error if the connector is being used by any model:

1 models are still using this connector, please delete or update the models first: [L05NKJIBVJ7VbbiuxOUx]

What alternatives have you considered? A clear and concise description of any alternative solutions or features you've considered.

Do you have any additional context? Add any other context or screenshots about the feature request here.

ylwu-amzn commented 4 weeks ago

Thanks @yuye-aws, Do you have bandwidth to help on this ?

ylwu-amzn commented 4 weeks ago

Assign to you first, let me know if you don't have enough bandwidth.

yuye-aws commented 3 weeks ago

I'm working on the opensearch-ml-quickstart project in these weeks. @ylwu-amzn You can mark this issue as 2.19.

yuye-aws commented 2 weeks ago

Plus one use case: Check ConnectorTool agent before deleting the connector

dhrubo-os commented 2 weeks ago

Similar issue: https://github.com/opensearch-project/ml-commons/issues/3088

dhrubo-os commented 2 weeks ago

@yuye-aws Let's publish your RFC / Design plan so that we can make sure it also aligns with team's bigger plan and also address the issue 3088 I mentioned here. We should also have detailed discussion on this.

yuye-aws commented 2 weeks ago

@dhrubo-os The issue mentions that scan ingest pipelines, search pipelines, any other places where the models may be used in existing resources. Can you at least provide a list of possible places where ml model is being used? For know, the model includes:

  1. ML Model tool
  2. Text embedding processor in pipelines.

Is there any other use cases?

xinyual commented 1 week ago

@ylwu-amzn Hi Yaliang, I will take this issue and come up with a RFC ASAP since yuye doesn't have enough bandwidth.

yuye-aws commented 1 week ago

@ylwu-amzn Hi Yaliang, I will take this issue and come up with a RFC ASAP since yuye doesn't have enough bandwidth.

Thanks Xinyuan! This issue can be assigned to you, right?

yuye-aws commented 1 week ago

@xinyual Found that RAG tool also has two fields: embedding_model_id and inference_model_id https://opensearch.org/docs/latest/ml-commons-plugin/agents-tools/tools/rag-tool/

yuye-aws commented 1 week ago

Also the conversational agent needs llm.model_id https://opensearch.org/docs/latest/ml-commons-plugin/api/agent-apis/register-agent/

yuye-aws commented 1 week ago

The model controller uses model_id https://opensearch.org/docs/latest/ml-commons-plugin/api/controller-apis/create-controller/

yuye-aws commented 1 week ago

There is a search response processor named retrieval_augmented_generation using model_id : https://opensearch.org/docs/latest/search-plugins/search-pipelines/rag-processor/

xinyual commented 1 week ago

Raise a RFC for this issue. https://github.com/opensearch-project/ml-commons/issues/3191 @yuye-aws @ylwu-amzn @dhrubo-os Please have a look.