opensearch-project / ml-commons

ml-commons provides a set of common machine learning algorithms, e.g. k-means, or linear regression, to help developers build ML related features within OpenSearch.
Apache License 2.0
87 stars 120 forks source link

[RFC] Support more local model types #1164

Open ylwu-amzn opened 11 months ago

ylwu-amzn commented 11 months ago

Currently, ml-commons only supports uploading text-embedding models. However, we believe there are other models that could be valuable additions to our platform:

Please comment on this issue if you require support for other local models or vote for the model you need the most.

hijakk commented 11 months ago

+1. Most of these would be valuable in my use cases, as well as language identification. Possibly langid is a different beast than this approach should support?

Image vectorization is a potentially tough use case as images can be large and including them in base64 natively in documents can dramatically inflate document size on disk rather than providing a reference pointer to external storage (such as s3).

ylwu-amzn commented 11 months ago

From https://github.com/opensearch-project/ml-commons/issues/1150#issuecomment-1660940973

@asfoorial suggests

I suggest to keep the door open for LLM hosting as there is a trend to get LLMs smaller with quantization yet achieve reasonable performance. I would say they will be hostable in ml nodes or other dedicated nodes.

nateynateynate commented 11 months ago

They all get a thumbs up from me, but I actually would love to see image embedding. I'm fascinated by it.

HungryHowies commented 11 months ago

+1 tbh , I would love to have them all.

austintlee commented 8 months ago

Are cross encoders covered under "rerank model"?

austintlee commented 8 months ago

@dhrubo-os (tagging you since you went over some basics on this with the OCI students)

Would it make it easier to produce ML input/output classes for all these different models if we used Smithy to define them and have it generate the classes. Just wondering what we can do to expedite progress on this using some common framework.

TrungBui59 commented 7 months ago

@ylwu-amzn @dhrubo-os I am interested in working on supporting the Question-answering model. Can you guys give me some hints on what I should do? Currently, I am thinking of following the approach that we used to support the text-embedding model