Open timxieICN opened 1 year ago
cc @Narsil
Hi @timxieICN ,
Thanks for the suggestion.
In general, sentence-similarity like https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2 are served by SentenceTransformers
which is a library on top of transformers
itself.
https://huggingface.co/sentence-transformers
Sentence transformers adds a few configuration specifically on how to do similarity with a given model as there's several ways to do it.
From a user point of view it should be relatively easy to do this:
from sentence_transformers import SentenceTransformer, util
model = SentenceTransformer(
model_id
)
embeddings1 = model.encode(
inputs["source_sentence"], convert_to_tensor=True
)
embeddings2 = model.encode(inputs["sentences"], convert_to_tensor=True)
similarities = util.pytorch_cos_sim(embeddings1, embeddings2)
This is exactly the code that is actually running to calculate those on the hub currently: https://github.com/huggingface/api-inference-community/blob/main/docker_images/sentence_transformers/app/pipelines/sentence_similarity.py
Adding this directly in transformers
would basically mean incorporating sentence-transformers
within transformers
and I'm not sure it's something desired. Maybe @amyeroberts or another core maintainer can confirm/infirm this.
Does this help ?
We definitely don't want a circular dependency like that!
As the example you shared @Narsil is so simple, I think it's a good replacement for a pipeline. Let's leave this issue open and if there's a lot of interest or new use case we can consider other possible options.
Hi @timxieICN ,
Thanks for the suggestion. In general, sentence-similarity like https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2 are served by
SentenceTransformers
which is a library on top oftransformers
itself.https://huggingface.co/sentence-transformers
Sentence transformers adds a few configuration specifically on how to do similarity with a given model as there's several ways to do it.
From a user point of view it should be relatively easy to do this:
from sentence_transformers import SentenceTransformer, util model = SentenceTransformer( model_id ) embeddings1 = model.encode( inputs["source_sentence"], convert_to_tensor=True ) embeddings2 = model.encode(inputs["sentences"], convert_to_tensor=True) similarities = util.pytorch_cos_sim(embeddings1, embeddings2)
This is exactly the code that is actually running to calculate those on the hub currently: https://github.com/huggingface/api-inference-community/blob/main/docker_images/sentence_transformers/app/pipelines/sentence_similarity.py
Adding this directly in
transformers
would basically mean incorporatingsentence-transformers
withintransformers
and I'm not sure it's something desired. Maybe @amyeroberts or another core maintainer can confirm/infirm this.Does this help ?
Hi @Narsil, this is api of sentence transformer, I want to use sentence similarity of T5 model. So how to do that?
Thank you
I think that measuring distance between elements provided, by any embedding generation model, would be desirable indeed, I'm open to try and help if you want to do that.
Feature request
HuggingFace now has a lot of Sentence Similarity models, but the pipeline does not yet support this: https://huggingface.co/docs/transformers/main_classes/pipelines
Motivation
HuggingFace now has a lot of Sentence Similarity models, but the pipeline does not yet support this: https://huggingface.co/docs/transformers/main_classes/pipelines
Your contribution
I can write a PR, but might need some one else's help.