Query Regarding Model Inference for Computing Cosine Similarity

huggingface / setfit

Efficient few-shot learning with Sentence Transformers

https://hf.co/docs/setfit

Apache License 2.0

2.25k stars 223 forks source link

Query Regarding Model Inference for Computing Cosine Similarity #426

Closed Anbrose closed 1 year ago

Anbrose commented 1 year ago

Hello,

First off, thank you for the setfit repository; it's a valuable contribution to the community!

I'm currently exploring the provided examples and I've come across something that I wanted to clarify regarding the training and inference stages. In the training phase, the model seems to learn the similarity between pairs of sentences from their encoded embeddings, and as a result, for each training step, two sentences are input to the model to calculate a similarity score.

However, when it comes to the inference stage with the following line:

preds = model(["i loved the spiderman movie!", "pineapple on pizza is the worst 🤮"])

It appears that two separate sentences are fed into the model independently, resulting in two individual labels/scores instead of a single similarity score between the two sentences.

If my objective is to compute the cosine similarity between two sentences during inference, just like during the training phase, how should I modify the inference code?

Thank you in advance for your guidance!

kgourgou commented 1 year ago

Hi @Anbrose

During training, the model indeed moves the embeddings around according to some target labels to make classification easier.

two sentences are input to the model to calculate a similarity score.

Not quite. What you are describing is a cross-encoder model. SetFit uses models according to the bi-encoder setup; each sentence is embedded by the same sentence transformer separately and then cosine similarity is computed.

SetFit models are supposed to return classification probabilities, not sentence embeddings! If you want to get embeddings, you need to first do

emb = model.encode(["i loved the spiderman movie!", "pineapple on pizza is the worst 🤮"]

Then you would have to compute the cosine similarity between the two examples from the embeddings in emb.

Anbrose commented 1 year ago

Hi @Anbrose

During training, the model indeed moves the embeddings around according to some target labels to make classification easier.

two sentences are input to the model to calculate a similarity score.

Not quite. What you are describing is a cross-encoder model. SetFit uses models according to the bi-encoder setup; each sentence is embedded by the same sentence transformer separately and then cosine similarity is computed.

SetFit models are supposed to return classification probabilities, not sentence embeddings! If you want to get embeddings, you need to first do
emb = model.encode(["i loved the spiderman movie!", "pineapple on pizza is the worst 🤮"]
Then you would have to compute the cosine similarity between the two examples from the embeddings in emb.

Thank you @kgourgou.

Yeah, I dived deep into the library code and know how to manipulate them rn. Thanks for your careful explanation.

tanguy0807 commented 3 months ago

Just to ensure I understand how SetFit works: SetFit uses a ST (Sentence Transformer) as a base model, but do we agree that this ST should only take a single sentence as input? In other words, this constrains SetFit to using, for example, bi-encoder STs (which accept a single input sentence) but it won't natively work with cross-encoder STs (which require two sentences as input)?

kgourgou commented 3 months ago

@tanguy0807 I think that's all correct.