Closed ehuaa closed 1 month ago
Just checked OpenAI's python lib, they defaultly encode the float data to "base64" if encoding_format
is not given, see here, so in openai_embedding_client.py, the encoding of the embedding returned became "base64" instead of "float", hence 8192 dimensions, if we add encoding_format=float
, the returned dimensions will be 4096. Will add a fix soon.
setting encoding_format=float
indeed resolve the issue, however maybe there is still a bug with base64
in the vllm server ? as it's the default encoding_format
used by openai python API it should still return the correct size I guess? the reason it's 8192 is that every second element is 0
setting
encoding_format=float
indeed resolve the issue, however maybe there is still a bug withbase64
in the vllm server ? as it's the defaultencoding_format
used by openai python API it should still return the correct size I guess? the reason it's 8192 is that every second element is 0
@hibukipanim This should hopefully fixed by https://github.com/vllm-project/vllm/pull/7855
Your current environment
Collecting environment information... PyTorch version: 2.3.1+cu121
GPU models and configuration: GPU 0: NVIDIA A40 GPU 1: NVIDIA A40 GPU 2: NVIDIA A40 GPU 3: NVIDIA A40
Nvidia driver version: 535.161.08 cuDNN version: Could not collect HIP runtime version: N/A MIOpen runtime version: N/A Is XNNPACK available: True
Versions of relevant libraries: [pip3] flashinfer==0.0.9+cu121torch2.3 [pip3] numpy==1.26.4 [pip3] nvidia-nccl-cu12==2.20.5 [pip3] onnx==1.14.1 [pip3] onnxruntime==1.18.1 [pip3] sentence-transformers==3.0.1 [pip3] torch==2.3.1 [pip3] torchvision==0.18.1 [pip3] transformers==4.42.4 [pip3] triton==2.3.1
vLLM Version: 0.5.3
🐛 Describe the bug
My vllm version is the latest version, v0.5.3 post1 first i launch a embedding server as below
python3 -m vllm.entrypoints.openai.api_server --model Salesforce/SFR-Embedding-Mistral --dtype bfloat16 --enforce-eager --max-model-len 8192
Salesforce/SFR-Embedding-Mistral is an embedding model which has the same architecture with intfloat/e5-mistralthen i use https://github.com/vllm-project/vllm/blob/main/examples/openai_embedding_client.py to test online embedding result. And returns a tensor of 8192 length which is not 4096 as MistralModel's hidden size. I also make two other test: a. run tests/entrypoints/openai/test_embedding.py and found that there is no problem with the three tests, which the embedding size is exactly 4096. b. run examples/offline_inference_embedding.py and the embedding size is also exactly 4096.
Can you have a look at what's going wrong with openai_embedding_client.py, thanks