Closed do-me closed 4 days ago
Here's the model architecture according to their README:
SentenceTransformer(
(0): Transformer({'max_seq_length': 128, 'do_lower_case': False}) with Transformer model: DistilBertModel
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False})
(2): Dense({'in_features': 768, 'out_features': 512, 'bias': True, 'activation_function': 'torch.nn.modules.activation.Tanh'})
)
It would appear as though they store the final "dense" layer in a separate folder (https://huggingface.co/sentence-transformers/distiluse-base-multilingual-cased-v2/tree/main/2_Dense) and the ONNX model you're loading was only converted from the pytorch_model.bin in the root directory.
If you try using the HF transformers python library (not sbert), you should also get 768 dimensions, simply because it doesn't know of the existence of the final dense layer.
Regarding a way to fix it, you could perhaps convert the dense layer to ONNX, then use another AutoModel
, and pass through the outputs from the transformer (after pooling/normalisation)
Thanks for your answer!
You're right with transformers
library in Python, it returns a 768 dimensional vector too.
from transformers import AutoModel, AutoTokenizer
import torch
model_name = "sentence-transformers/distiluse-base-multilingual-cased-v2"
model = AutoModel.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)
inputs = tokenizer("Hello, my dog is cute", return_tensors="pt")
outputs = model(**inputs)
sentence_embedding = outputs[0][0][0]
len(sentence_embedding)
#768
or simply
from transformers import pipeline
pipe = pipeline('feature-extraction', model= "sentence-transformers/distiluse-base-multilingual-cased-v2")
out = pipe('I love transformers!')
len(out[0][0])
#768
where the first tensor ([CLS]) should be the sentence embedding (afaik) according to the BERT paper (right?).
I suppose the dense layer in the sentence transformer models serves only for shortening the tensors and saving memory. It's certainly a nice banana skin to slip over. :D
Are you aware of any way to add the dense layer to the onnx.model
so I could create it once for my purpose? I want to avoid loading two models and piping data around.
Also (for anyone reading this in the future), I am not aware of any parameter to ignore the dense layer in sentence transformer models.
Are you aware of any way to add the dense layer to the onnx.model so I could create it once for my purpose? I want to avoid loading two models and piping data around.
Maybe @fxmarty or @michaelbenayoun can help with this? It most likely will require some custom config.
I was just playing around with the model distiluse-base-multilingual-cased-v2 and noticed that your onnx versions both (quantized and normal) produce embeddings with 768-dimensional vectors instead of 512.
Example:
index.html
main.js
That gives me
However, the model page states
Also, I used the Python package
which gives me a correct 512-dimensional embedding.
Am I missing some option here or overseeing the obvious?