[Question] distiluse-base-multilingual-cased-v2 - wrong vector dimension (768 vs 512) in onnx version?

do-me commented 1 year ago

I was just playing around with the model distiluse-base-multilingual-cased-v2 and noticed that your onnx versions both (quantized and normal) produce embeddings with 768-dimensional vectors instead of 512.

Example:

index.html

<!DOCTYPE html>
<html>
  <head>
    <title>Transformers.js Example</title>
  </head>
  <body>
    <h1>Transformers.js Example</h1>
    <script type="module" src="main.js"></script>
  </body>
</html>

main.js

import { pipeline } from 'https://cdn.jsdelivr.net/npm/@xenova/transformers@2.4.4';

async function allocatePipeline() {
  let pipe = await pipeline("feature-extraction",
                             "Xenova/distiluse-base-multilingual-cased-v2");
  let out = await await pipe("test", { pooling: 'mean', normalize: true });
  console.log(out);
}
allocatePipeline();

That gives me

Proxy(s) {dims: Array(2), type: 'float32', data: Float32Array(768), size: 768}

However, the model page states

This is a sentence-transformers model: It maps sentences & paragraphs to a 512 dimensional dense vector space and can be used for tasks like clustering or semantic search.

Also, I used the Python package

from sentence_transformers import SentenceTransformer
model = SentenceTransformer('sentence-transformers/distiluse-base-multilingual-cased-v2')
model.encode("test")

which gives me a correct 512-dimensional embedding.

Am I missing some option here or overseeing the obvious?

xenova commented 1 year ago

Here's the model architecture according to their README:

SentenceTransformer(
  (0): Transformer({'max_seq_length': 128, 'do_lower_case': False}) with Transformer model: DistilBertModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False})
  (2): Dense({'in_features': 768, 'out_features': 512, 'bias': True, 'activation_function': 'torch.nn.modules.activation.Tanh'})
)

It would appear as though they store the final "dense" layer in a separate folder (https://huggingface.co/sentence-transformers/distiluse-base-multilingual-cased-v2/tree/main/2_Dense) and the ONNX model you're loading was only converted from the pytorch_model.bin in the root directory.

If you try using the HF transformers python library (not sbert), you should also get 768 dimensions, simply because it doesn't know of the existence of the final dense layer.

Regarding a way to fix it, you could perhaps convert the dense layer to ONNX, then use another AutoModel, and pass through the outputs from the transformer (after pooling/normalisation)

do-me commented 1 year ago

Thanks for your answer!

You're right with transformers library in Python, it returns a 768 dimensional vector too.

from transformers import AutoModel, AutoTokenizer
import torch 

model_name = "sentence-transformers/distiluse-base-multilingual-cased-v2"  
model = AutoModel.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)

inputs = tokenizer("Hello, my dog is cute", return_tensors="pt")
outputs = model(**inputs)

sentence_embedding = outputs[0][0][0]

len(sentence_embedding)
#768

or simply

from transformers import pipeline
pipe = pipeline('feature-extraction', model= "sentence-transformers/distiluse-base-multilingual-cased-v2")
out = pipe('I love transformers!')
len(out[0][0])
#768

where the first tensor ([CLS]) should be the sentence embedding (afaik) according to the BERT paper (right?).

I suppose the dense layer in the sentence transformer models serves only for shortening the tensors and saving memory. It's certainly a nice banana skin to slip over. :D

Are you aware of any way to add the dense layer to the onnx.model so I could create it once for my purpose? I want to avoid loading two models and piping data around.

Also (for anyone reading this in the future), I am not aware of any parameter to ignore the dense layer in sentence transformer models.

xenova commented 1 year ago

Are you aware of any way to add the dense layer to the onnx.model so I could create it once for my purpose? I want to avoid loading two models and piping data around.

Maybe @fxmarty or @michaelbenayoun can help with this? It most likely will require some custom config.

huggingface / transformers.js

[Question] distiluse-base-multilingual-cased-v2 - wrong vector dimension (768 vs 512) in onnx version? #230