Open ylwu-amzn opened 1 year ago
+1. Most of these would be valuable in my use cases, as well as language identification. Possibly langid is a different beast than this approach should support?
Image vectorization is a potentially tough use case as images can be large and including them in base64 natively in documents can dramatically inflate document size on disk rather than providing a reference pointer to external storage (such as s3).
From https://github.com/opensearch-project/ml-commons/issues/1150#issuecomment-1660940973
@asfoorial suggests
I suggest to keep the door open for LLM hosting as there is a trend to get LLMs smaller with quantization yet achieve reasonable performance. I would say they will be hostable in ml nodes or other dedicated nodes.
They all get a thumbs up from me, but I actually would love to see image embedding. I'm fascinated by it.
+1 tbh , I would love to have them all.
Are cross encoders covered under "rerank model"?
@dhrubo-os (tagging you since you went over some basics on this with the OCI students)
Would it make it easier to produce ML input/output classes for all these different models if we used Smithy to define them and have it generate the classes. Just wondering what we can do to expedite progress on this using some common framework.
@ylwu-amzn @dhrubo-os I am interested in working on supporting the Question-answering model. Can you guys give me some hints on what I should do? Currently, I am thinking of following the approach that we used to support the text-embedding model
Currently, ml-commons only supports uploading text-embedding models. However, we believe there are other models that could be valuable additions to our platform:
Please comment on this issue if you require support for other local models or vote for the model you need the most.