jxmorris12 / vec2text

utilities for decoding deep representations (like sentence embeddings) back to text
Other
747 stars 84 forks source link

Sentence Transformer Embeddings #32

Closed mim201820 closed 9 months ago

mim201820 commented 9 months ago

I have a question regarding the embedding with distiluse-base-multilingual-cased-v1

I used the following code but I got an error:

model = SentenceTransformer("distiluse-base-multilingual-cased-v1")

query_embedding = model.encode(["How big is London?"], convert_to_tensor=True)

corrector = vec2text.load_corrector("gtr-base")
output = vec2text.invert_embeddings(
    embeddings=query_embedding,
    corrector=corrector,
    num_steps=20,
)
print(output)

error:

RuntimeError: mat1 and mat2 shapes cannot be multiplied (1x512 and 768x768)

How can I invert the embedding of this model ?

sacharbit commented 9 months ago

That means that the corrector model and the embedding model don't have the same embedding size. To be sure I would need the full traceback but I'm pretty confident about it

jxmorris12 commented 9 months ago

Yep! You're trying to use the corrector trained for a certain type of embedding (gtr-base embeddings, dim 768) on other embeddings, which have dimension 512. You'll have to train a new corrector for "distiluse-base-multilingual-cased-v1", unfortunately.