opensearch-project / ml-commons

ml-commons provides a set of common machine learning algorithms, e.g. k-means, or linear regression, to help developers build ML related features within OpenSearch.
Apache License 2.0
98 stars 136 forks source link

[RFC] Support more local model types #1164

Open ylwu-amzn opened 1 year ago

ylwu-amzn commented 1 year ago

Currently, ml-commons only supports uploading text-embedding models. However, we believe there are other models that could be valuable additions to our platform:

Please comment on this issue if you require support for other local models or vote for the model you need the most.

hijakk commented 1 year ago

+1. Most of these would be valuable in my use cases, as well as language identification. Possibly langid is a different beast than this approach should support?

Image vectorization is a potentially tough use case as images can be large and including them in base64 natively in documents can dramatically inflate document size on disk rather than providing a reference pointer to external storage (such as s3).

ylwu-amzn commented 1 year ago

From https://github.com/opensearch-project/ml-commons/issues/1150#issuecomment-1660940973

@asfoorial suggests

I suggest to keep the door open for LLM hosting as there is a trend to get LLMs smaller with quantization yet achieve reasonable performance. I would say they will be hostable in ml nodes or other dedicated nodes.

nateynateynate commented 1 year ago

They all get a thumbs up from me, but I actually would love to see image embedding. I'm fascinated by it.

HungryHowies commented 1 year ago

+1 tbh , I would love to have them all.

austintlee commented 1 year ago

Are cross encoders covered under "rerank model"?

austintlee commented 1 year ago

@dhrubo-os (tagging you since you went over some basics on this with the OCI students)

Would it make it easier to produce ML input/output classes for all these different models if we used Smithy to define them and have it generate the classes. Just wondering what we can do to expedite progress on this using some common framework.

TrungBui59 commented 12 months ago

@ylwu-amzn @dhrubo-os I am interested in working on supporting the Question-answering model. Can you guys give me some hints on what I should do? Currently, I am thinking of following the approach that we used to support the text-embedding model