Closed zhuhaozhe closed 2 months ago
With this example usage
curl 127.0.0.1:8080/embed \
-X POST \
-d '{"inputs":"What is Deep Learning?"}' \
-H 'Content-Type: application/json'
The input_id's shape is [1, 7] which means BS=1, seq_len=7, hidden_size=1024 Do we expected output embeddings shape=[1024] or [7168](7 * 1024)?
We have 512 tokens per batch but we only want the first token's embeddings?
Yes, that's called class pooling.
It should be a reshape instead of a view yes. Reshape forces the tensor to be contiguous.
We have 512 tokens per batch but we only want the first token's embeddings?
Yes, that's called class pooling.
It should be a reshape instead of a view yes. Reshape forces the tensor to be contiguous.
@OlivierDehaene, thanks!
System Info
test-embeddings-inference==v1.5.0 python==3.9 Run with CPU device
Information
Tasks
Reproduction
Follow https://github.com/huggingface/text-embeddings-inference?tab=readme-ov-file#local-install to but with "-F python" and start service with
Expected behavior
More explanations
I am trying to use the python backends and I met some error because of these two lines: https://github.com/huggingface/text-embeddings-inference/blob/661a77ffba48f92fccda8c7b7302f6a973570016/backends/python/server/text_embeddings_server/models/default_model.py#L44-L45
I inserted some logs to see the shapes:
I am an newbee for text embedding. From my understanding here, the batch size should be 32, seq_len should be 512, I wish to understand what is expected output embeding shapes? The
output
should betorch.Size([32, 512, 1024])
This slice means we only choose the first embedding for 1 batch? Does this expected? We have 512 tokens per batch but we only want the first token's embeddings?If it is expected, we may just modify
to
reshape
from the suggest in the error msg/If it is not expected, should we use
embedding = output[0]
instead ofembedding = output[0][:, 0]
and also correct