neuralmagic / deepsparse

Sparsity-aware deep learning inference runtime for CPUs
https://neuralmagic.com/deepsparse/
Other
2.97k stars 171 forks source link

Using output_value as "token_embeddings" is broken for Sentence Transformer #1448

Closed mikeybellissimo closed 4 months ago

mikeybellissimo commented 9 months ago

Despite the code saying that you can specify "token_embeddings" for the DeepSparseSentenceTransformer.encode method as the value for the output_value argument, it is not actually set up properly to do this. The issue can be very clearly seen by looking at the following code from the encode method in this file: https://github.com/neuralmagic/deepsparse/blob/main/src/deepsparse/sentence_transformers/sentence_transformer.py

image

As you can see, out_features is an empty dict but only a few lines down, within the condition where output_value is "token_emeddings" is met, we attempt to retrieve the value using the key "token_embeddings" from an empty dictionary leading to the following error:

image

which occurs when running the code shown below:

from deepsparse.sentence_transformers import DeepSparseSentenceTransformer
model = DeepSparseSentenceTransformer('neuralmagic/bge-large-en-v1.5-quant', export=False)

# Our sentences we like to encode
sentences = ['This framework generates embeddings for each input sentence',
    'Sentences are passed as a list of string.',
    'The quick brown fox jumps over the lazy dog.']

# Sentences are encoded by calling model.encode()
embeddings = model.encode(sentences, output_value="token_embeddings")

# Print the embeddings
for sentence, embedding in zip(sentences, embeddings):
    print("Sentence:", sentence)
    print("Embedding:", embedding.shape)
    print("")

Really great work on this project overall though so thank you for all the effort you've put in so far so that people like me can easily make use of an awesome project like this! Thanks, Michael

mgoin commented 9 months ago

Hi @mikeybellissimo thanks for reporting this error! I'm not sure how to immediately resolve this, do you have a candidate fix? Would be happy to take the contribution. Otherwise I will look into this in a few days.

jeanniefinks commented 4 months ago

Hi @mikeybellissimo As some time has passed with no further updates, I am going to go ahead and close out this issue. If you have comments to contribute, please feel free to re-open the thread. Thank you! Jeannie / Neural Magic